Simple Question


New Member
If you wish to perform a simple linear regression on two variables but have no intuition as to which to use as the independent variable, which should you use? I think the answer depends on the variance of each variable, and give that the variance of the independent variable will be in the denominator of the equation you should use the higher variance variable as the independent, but wasn't sure if i was correct. thanks.


New Member
Which of the variables that should be considered independent cannot be determined mathematically. You need to rely on theory to determine this.

From a mathematical standpoint it doesn't matter which of the variable you call independent.


New Member
I recommend using a sound theory. For example, if you have the two variables GDP growth and the return of a single stock, it should be obvious that GDP growth can influence the return of that single stock, but not the other way round.

Use regression if you want to predict/estimate one variable from another. In some cases it is clear from the context which you should use as in the example above for GDP growth and the return of a single stock. Note that in general regression of x on y will not produce the same results as y on x.

If you do not know what you are trying to predict my suggestion would be to use the easiest to measure for the independent variable. That way you are predicting/estimating something that is hard to measure from something that is easy to measure. For an absurd example say there is some disease and there seems to be a relationship between this disease and blood pressure (magnitude of corrleation is close to 1) and you do not know if the disease causes high blood pressure or if high blood pressure causes the disease. You are a total quack doctor and you can not make up your mind as to which is the most important thing to know for a fact, blood pressure or extent of disease. You could cut a patient open to measure the extent of the disease and use that to predict blood pressure or measure their blood pressure and use that to predict the extent of the disease.

The easiest to measure is the blood pressure so use that as the independant variable.

Before you do the regression you should first examine to see if there is a relationship and find the strength of that relationship.

You would be surprised how often this comes up in social science. For example, job satisfaction can be an outcome, an antecedent, or an intervening variable. Your model depends on what you are trying to answer.