+ Reply to Thread
Results 1 to 6 of 6

Thread: Limitations on R and other software options

  1. #1
    Points: 3,127, Level: 34
    Level completed: 52%, Points required for next Level: 73

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Limitations on R and other software options



    I am trying to use statistical software to model data from a PostgresSQL relational database and am running into size limitation problems. When I try to load more than 100,000 entries into R using either read.table or scan, it drops all but the first 100,000. What's the best option to get around this, as I will need to run analysis on potentially several million data entries? Do I need to find some way to export the statistics into my database, is there a package for R that can handle more data, or should I consider something like SAS (and how much would a license cost)? Or, should I even consider some mathematical software like Matlab? Thanks for the help.

  2. #2
    R purist
    Points: 14,206, Level: 77
    Level completed: 39%, Points required for next Level: 244
    TheEcologist's Avatar
    Location
    The Netherlands.
    Posts
    1,371
    Thanks
    135
    Thanked 282 Times in 152 Posts
    Quote Originally Posted by mcolaco View Post
    I am trying to use statistical software to model data from a PostgresSQL relational database and am running into size limitation problems. When I try to load more than 100,000 entries into R using either read.table or scan, it drops all but the first 100,000. What's the best option to get around this, as I will need to run analysis on potentially several million data entries? Do I need to find some way to export the statistics into my database, is there a package for R that can handle more data, or should I consider something like SAS (and how much would a license cost)? Or, should I even consider some mathematical software like Matlab? Thanks for the help.
    So after you have loaded an object. A=read.table("a PostgresSQL relational database").

    you type dim(A) [or length(A) if it is a vector] and see that it only loads the first 100.000?

    Thats very strange, I've never had any problems of the sort and I have worked with data frames of about 1e+06 x 8 elements in R. R should be able to handle even bigger files see:

    http://stat.ethz.ch/R-manual/R-patch...ry-limits.html

    and

    http://stat.ethz.ch/R-manual/R-patch...ject.size.html

    Do you get any error messages? Hows your system specs?

    The problem might be related to read.table however I doubt.

    You can try read.csv() or read.csv2().

    Also have you tried posting your question to the R-Help mailing list?
    They will be more than happy to explain (in detail) what could be wrong.

    Before you try them however please search the archives:

    http://finzi.psych.upenn.edu/search.html

    Good luck,
    The true ideals of great philosophies always seem to get lost somewhere along the road..

  3. #3
    Points: 3,127, Level: 34
    Level completed: 52%, Points required for next Level: 73

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Here's the error I get.

    [ reached getOption("max.print") -- omitted 150887 rows ]]

    I'll try posting on the R-help forum. I just thought I would probe for people's opinions on other options if R has some limitation in data.

    I'll do some more troubleshooting myself, if you've been able to handle so much data in R. Thanks.

  4. #4
    R purist
    Points: 14,206, Level: 77
    Level completed: 39%, Points required for next Level: 244
    TheEcologist's Avatar
    Location
    The Netherlands.
    Posts
    1,371
    Thanks
    135
    Thanked 282 Times in 152 Posts
    Quote Originally Posted by mcolaco View Post
    [ reached getOption("max.print") -- omitted 150887 rows ]]

    I'll try posting on the R-help forum. I just thought I would probe for people's opinions on other options if R has some limitation in data.

    I'll do some more troubleshooting myself, if you've been able to handle so much data in R. Thanks.
    I think I understand your problem now. When loading a dataset you need to save it as an object.

    In R:

    >mydata = read.table("datalocation")

    if you just type:

    >read.table("datalocation")

    R will try to "print" the whole dataset on your screen and not save it in an object. There is a maximum limit to how much you may "print" to your screen.

    You can influence this via options(max.print=64), in this example you can now only display 64 rows.

    However printing to screen is not very informative when you have a huge dataset.

    Rather save your dataset as an object then check if its the right size via dim(your object here).

    Did I guess your problem rightly?
    The true ideals of great philosophies always seem to get lost somewhere along the road..

  5. #5
    Points: 3,127, Level: 34
    Level completed: 52%, Points required for next Level: 73

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Yep, that does it

    Even when I save the data as an object, I still get some warnings. However, I don't believe those have to do with the size of the object.

    Thanks for the help.

  6. #6
    R purist
    Points: 14,206, Level: 77
    Level completed: 39%, Points required for next Level: 244
    TheEcologist's Avatar
    Location
    The Netherlands.
    Posts
    1,371
    Thanks
    135
    Thanked 282 Times in 152 Posts

    Quote Originally Posted by mcolaco View Post
    Even when I save the data as an object, I still get some warnings. However, I don't believe those have to do with the size of the object.

    Thanks for the help.
    no problem
    The true ideals of great philosophies always seem to get lost somewhere along the road..

+ Reply to Thread

Similar Threads

  1. Regress function options
    By Yida in forum Stata
    Replies: 3
    Last Post: 04-11-2011, 05:00 AM
  2. Replies: 2
    Last Post: 12-22-2010, 08:29 AM
  3. Replies: 0
    Last Post: 12-02-2010, 02:30 PM
  4. Replies: 1
    Last Post: 06-17-2010, 04:54 PM
  5. Limitations of ANOVA
    By xNataliex in forum SPSS
    Replies: 1
    Last Post: 02-10-2010, 02:07 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats