#### carriedinterest

##### New Member
I've been tasked with conducting a statistical analysis and have encountered a few potentially significant problems. As its been quite some time since I last worked with statistics, I was hoping that someone on the forum would be able to advise on the most appropriate methodology for dealing with my data.

Project overview: I am am attempting to build a regression model to project future volumes of a continuous variable. The dependent variable is financial in nature, and contains 162 unique observations. The data itself is non-linear - volumes have increased exponentially over the past 4-5 years.

The independent variables:

1) Movements in a major equity index. The data shows the strongest bivariate correlation to the dependent variable after a 2 quarter lag - I have accounted for this in the data matching observation 1 with observation 3 of the DV.

2) Movements in a major credit index. The data are similar to those described above, with the exception that the strongest bivarite correlation occurs with a 5 quarter lag.

3) A continuous, financial variable (in may ways similar to the DV, but with a clear hypothesized causal relationship).

4) A binary "dummy" variable designed to control for natural cyclicality in the DV.

These data were used in a multiple regression analysis. The model has a high R squared and each of the coefficients have a p-value less than .05. However, I am concerned that the data suffer from a number of common statistical ailments including:

- Heteroskedasticity: the events of the past four quarters have severely warped the "linear" relationship between the variables - a plot of the residuals reveals a small band of irregular observations.

- Multicollinearity: each of the independent variables are correlated to a fairly strong degree. I suspect this is an inherent limitation of applying statistical methods to financial issues (as all indicators tend to be correlated), hence the reason that models of this type are not used commonly by financial institutions.

- On a related note, I worry that the events of the past few quarters have had a disproportionately large affect on the coefficients - the correlations between the independent variables (while extant in any case) have increased rather significantly over the past few periods. Thus, I worry that the inclusion of data from this period exaggerates the strength of the coefficients and buggers the regression equation (for example, the Y-intercept of the model output is very low - starting from such a low (negative) baseline renders it impossible to accurately project the DV regardless of the magnitude of movements in the independent variables).

Any ideas on how best to treat the data? Is the project able to be salvaged, or should I throw in the towel?