For 100 companies, I have collected (i) `tweets` and (ii) corporate website `pageviews` for `148` days. The tweetvolume and pageviews per day are two independent variables corpaired against the stock `trading volume` for each company, resulting in 100 x 148 = 14,800 observations. My data is structured like this:
Because there is much difference in company-size (some companies only receive 2 tweets per day, where others like Apple get over 10,000 per day), all variables are logged to smoothen distribution. (This is in line with previous research - this is for my thesis).
I just performed a linear regression on this data, including both independend variables. R-Squared is .411 but Durbin-Watson only .141 (!) Without looking for the exact bounderies, I know this directly means my residuals are non-linear, eg. auto-correlated, right?
My question is: how can I solve this? When I think about it, this data should not be autocorrelated, so I don't really understand. Is it due to this actually being a timeseries analysis? I wouldn't think that either, since for instance trading volume today is independent of yesterdays trading volume. Can somebody explain this to me?
P.S. At my university, we use SPSS/PASW without additional modules, so I am unable to perform a timeseries analysis on this like you could in STATA or R.