Thread: What to do when number of variables is greater than number of observations?

1. What to do when number of variables is greater than number of observations?

What are some techniques that are utilized when the number of predictors is greater than the number of observations?

2. Re: What to do when number of variables is greater than number of observations?

All subsets regression is one approach that is used when this is the case. The technique involves fitting all possible linear models for all levels of data sparsity. Forward stepwise regression is sometimes used for this case as well.

3. Re: What to do when number of variables is greater than number of observations?

1) Gather more data (probably the best way)
2) Remove or collapse variables. For example you may decide that one variable really is not critical, that one will serve to measure two and so on. One possibility is to create an index variable that adds several of your variables together. If you have likert scale data a second advantage of this type of combination is that the results will likely be interval while (according to some statisticians anyhow) likert data normally is not.

4. Re: What to do when number of variables is greater than number of observations?

I try using subset selection in R, but the number of variables I have is over 500. R just hangs when I use the leaps backage.

5. Re: What to do when number of variables is greater than number of observations?

If you have 500 variables (something that I find amazing) you might try factor analysis and use the factors rather than the variables in your model if that makes conceptual sense.

6. Re: What to do when number of variables is greater than number of observations?

I would also suggest all possible subsets using Mallow's C(p) as a criterion. SAS handles this well.

7. Re: What to do when number of variables is greater than number of observations?

Is it even possible to estimate the parameters if number of variables is greater than number of observations? I guess the degrees of freedom gets below 0 if this is the case, which means that you cannot find any solution (or are there infinitely many solutions, don't remember?).

8. Re: What to do when number of variables is greater than number of observations?

As far as I know you can not estimate a model with more parameters than observations. Unique parameters can not be estimated with 0 or negative DF (I don't understand conceptually what a negative DF even means, sort of like something disolving before it enters solution in a PH problems).

9. Re: What to do when number of variables is greater than number of observations?

Poster,

You have received some great suggestions here. Two questions: first, you are testing these potential independent varaibles because they make reasonable sense as predictors, or is this a fishing expedition. Second, can you share the context of the scenario (this might open the door for others whom perform comparable research to describe their techniques to your situation)?

10. Re: What to do when number of variables is greater than number of observations?

This will go down in history as the 1000th post of hlsmith!!!!!!!!!!!!!!!!!!!!!!!!!

11. The Following User Says Thank You to trinker For This Useful Post:

hlsmith (02-15-2013)

12. Re: What to do when number of variables is greater than number of observations?

Other possibilities is PCR, also called PCA and PLS. If hlsmith had not cut down on post length I would have told you what these abbreviations means and given some appications examples. I don't want to leave a "monster" to read.

13. Re: What to do when number of variables is greater than number of observations?

Whenever I run factanal I get an error Error in solve.default(cv) :
system is computationally singular: reciprocal condition number = 3.80806e-21. This means that I cannot use factor analysis

14. Re: What to do when number of variables is greater than number of observations?

Do you mean that I should run a logistic regression with Mallows Cp as the criterion?

15. Re: What to do when number of variables is greater than number of observations?

I dont know that code, but my guess is you can not run the EFA with the number of observations you have. You need enough data to calculate unique parameters and I don't think you have that. Is it possible to gather more data?

16. Re: What to do when number of variables is greater than number of observations?

I have 12000 observations and can only use 200 observations for the training set. The rest is for the test set. So I cannot gather more data. The goal is to build a predictive model that has good predictive accuracy for the test set.