# Is this really independent

#### noetsi

##### Fortran must die
I read an article today that made me wonder. They used linear regression to predict income at closure. One of their predictor variables was income earlier in the process. To me that seems to violate the assumption that the observations are independent [they are using as the DV someone's income at acceptance to what it is when thy leave}. It would seem the SE would be wrong this way because of autocorrelation.

.

#### Dason

The assumption is that the error terms are independent. If we had to assume that the predictors and the response were independent then we wouldn't really get anywhere. It would only be an issue if you had multiple responses (you can think of this as 'rows' in your data set) that corresponded to the same person/whatever.

#### noetsi

##### Fortran must die
It would only be an issue if you had multiple responses (you can think of this as 'rows' in your data set) that corresponded to the same person/whatever.
Isn't that exactly what is occurring when you measure income at t1 and t2? It would seem autocorrelation would be an issue.

#### spunky

##### Smelly poop man with doo doo pants.
Isn't that exactly what is occurring when you measure income at t1 and t2? It would seem autocorrelation would be an issue.
it depends on how you structure your data i guess. for instance, if your data looks like this (in so-called 'short form'):

Code:
Person     Income T1      Income T2

noetsi     blah blah      blah blah
Dason      bleh bleh      bleh bleh
spunky     blih blih      blih blih
then we're row-independent because noetsi's income (in theory) should not be correlated with Dason's income which should not be correlated with
spunky's income and whatnot. because we're different people doing different things and yadda yadda. the columns, however, are NOT independent since,
as you correctly pointed out, income at Time 1 and at Time2 should be (highly) correlated, other things being considered.

but if you change it to be in so-called 'long format'

Code:
Person    Income     Time

noetsi    blah         1
noetsi    blah         2
Dason     bleh         1
Dason     bleh         2
spunky    blih         1
spunky    blih         2
then we're no longer row-independent and straight-forward OLS regression would have issues. that's why when you were doing your MA and you used that software
HLM to run multilevel models for time-dependent data, HLM wanted you to either give it data in "long format" or will convert it in "long format" to
create the dependency structure it needs to work its magic.