For 100 companies, I have collected (i) `tweets` and (ii) corporate website `pageviews` for `148` days. The tweetvolume and pageviews per day are two independent variables corpaired against the stock `trading volume` for each company, resulting in 100 x 148 = 14,800 observations. My data is structured like this:
Because there is much difference in company-size (some companies only receive 2 tweets per day, where others like Apple get over 10,000 per day), all variables are logged to smoothen distribution. (This is in line with previous research - this is for my thesis).
I just performed a linear regression on this data, including both independend variables. R-Squared is .411 but Durbin-Watson only .141 (!) Without looking for the exact bounderies, I know this directly means my residuals are non-linear, eg. auto-correlated, right?
My question is: how can I solve this? When I think about it, this data should not be autocorrelated, so I don't really understand. Is it due to this actually being a timeseries analysis? I wouldn't think that either, since for instance trading volume today is independent of yesterdays trading volume. Can somebody explain this to me?
P.S. At my university, we use SPSS/PASW without additional modules, so I am unable to perform a timeseries analysis on this like you could in STATA or R.
Code:
company date tweetVol pageviewVol tradingVol
------------------------------------------------
1 1 200 150 2423325
1 2 194 152 2455343
1 3 214 199 3100429
. . . . .
. . . . .
1 148 205 233 2563463
2 1 752 932 7434124
2 2 932 2423 7464354
2 3 600 1435 5324323
. . . . .
. . . . .
. . . . .
100 148 3 155 32324
I just performed a linear regression on this data, including both independend variables. R-Squared is .411 but Durbin-Watson only .141 (!) Without looking for the exact bounderies, I know this directly means my residuals are non-linear, eg. auto-correlated, right?
My question is: how can I solve this? When I think about it, this data should not be autocorrelated, so I don't really understand. Is it due to this actually being a timeseries analysis? I wouldn't think that either, since for instance trading volume today is independent of yesterdays trading volume. Can somebody explain this to me?
P.S. At my university, we use SPSS/PASW without additional modules, so I am unable to perform a timeseries analysis on this like you could in STATA or R.