More predictors than observations?


New Member
What does it mean when statisticians talk about having more predictors than observations in a regression model? How could that even be possible? Why is it a problem in regression? Apologies, I am new to quant analysis and stats so not quite sure why this is the case? I would appreciate the simplest possible explanation.


Less is more. Stay pure. Stay poor.
Look up "sparsity" in data or "curse of dimensionality. The issue is that you have wider than long data, so possibly more descriptors than you can have adequately represented within samples. Extreme example, say you have three observations and five predictors (age, gender, race, employment status, education), you will have lots of null values for combinations of these predictors. E.g., no old, female, Africans, uneducated, etc. This can come into play when looking at a disease and thousands of genes in a finite sample size.
Last edited: