+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: What to do when number of variables is greater than number of observations?

  1. #1
    Points: 3,846, Level: 39
    Level completed: 31%, Points required for next Level: 104

    Posts
    50
    Thanks
    0
    Thanked 0 Times in 0 Posts

    What to do when number of variables is greater than number of observations?




    What are some techniques that are utilized when the number of predictors is greater than the number of observations?

  2. #2
    Points: 462, Level: 9
    Level completed: 24%, Points required for next Level: 38

    Location
    Baton Rouge, LA
    Posts
    11
    Thanks
    1
    Thanked 2 Times in 2 Posts

    Re: What to do when number of variables is greater than number of observations?

    All subsets regression is one approach that is used when this is the case. The technique involves fitting all possible linear models for all levels of data sparsity. Forward stepwise regression is sometimes used for this case as well.

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when number of variables is greater than number of observations?

    1) Gather more data (probably the best way)
    2) Remove or collapse variables. For example you may decide that one variable really is not critical, that one will serve to measure two and so on. One possibility is to create an index variable that adds several of your variables together. If you have likert scale data a second advantage of this type of combination is that the results will likely be interval while (according to some statisticians anyhow) likert data normally is not.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    Points: 3,846, Level: 39
    Level completed: 31%, Points required for next Level: 104

    Posts
    50
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: What to do when number of variables is greater than number of observations?

    I try using subset selection in R, but the number of variables I have is over 500. R just hangs when I use the leaps backage.

  5. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when number of variables is greater than number of observations?

    If you have 500 variables (something that I find amazing) you might try factor analysis and use the factors rather than the variables in your model if that makes conceptual sense.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  6. #6
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: What to do when number of variables is greater than number of observations?

    I would also suggest all possible subsets using Mallow's C(p) as a criterion. SAS handles this well.

  7. #7
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: What to do when number of variables is greater than number of observations?

    Is it even possible to estimate the parameters if number of variables is greater than number of observations? I guess the degrees of freedom gets below 0 if this is the case, which means that you cannot find any solution (or are there infinitely many solutions, don't remember?).

  8. #8
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when number of variables is greater than number of observations?

    As far as I know you can not estimate a model with more parameters than observations. Unique parameters can not be estimated with 0 or negative DF (I don't understand conceptually what a negative DF even means, sort of like something disolving before it enters solution in a PH problems).
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  9. #9
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: What to do when number of variables is greater than number of observations?

    Poster,

    You have received some great suggestions here. Two questions: first, you are testing these potential independent varaibles because they make reasonable sense as predictors, or is this a fishing expedition. Second, can you share the context of the scenario (this might open the door for others whom perform comparable research to describe their techniques to your situation)?
    Stop cowardice, ban guns!

  10. #10
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: What to do when number of variables is greater than number of observations?

    This will go down in history as the 1000th post of hlsmith!!!!!!!!!!!!!!!!!!!!!!!!!
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  11. The Following User Says Thank You to trinker For This Useful Post:

    hlsmith (02-15-2013)

  12. #11
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: What to do when number of variables is greater than number of observations?

    Other possibilities is PCR, also called PCA and PLS. If hlsmith had not cut down on post length I would have told you what these abbreviations means and given some appications examples. I don't want to leave a "monster" to read.

  13. #12
    Points: 3,846, Level: 39
    Level completed: 31%, Points required for next Level: 104

    Posts
    50
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: What to do when number of variables is greater than number of observations?

    Whenever I run factanal I get an error Error in solve.default(cv) :
    system is computationally singular: reciprocal condition number = 3.80806e-21. This means that I cannot use factor analysis

  14. #13
    Points: 3,846, Level: 39
    Level completed: 31%, Points required for next Level: 104

    Posts
    50
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: What to do when number of variables is greater than number of observations?

    Do you mean that I should run a logistic regression with Mallows Cp as the criterion?

  15. #14
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do when number of variables is greater than number of observations?

    I dont know that code, but my guess is you can not run the EFA with the number of observations you have. You need enough data to calculate unique parameters and I don't think you have that. Is it possible to gather more data?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  16. #15
    Points: 3,846, Level: 39
    Level completed: 31%, Points required for next Level: 104

    Posts
    50
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: What to do when number of variables is greater than number of observations?


    I have 12000 observations and can only use 200 observations for the training set. The rest is for the test set. So I cannot gather more data. The goal is to build a predictive model that has good predictive accuracy for the test set.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats