Best approach, merge datasets if sampled the same way, address collinearity, add variable for time period and look at interactions.
Other approaches would be compromises.
I have done my best to do an extensive literature review to answer this question, but I've frustratingly found nothing. So I would appreciate any and all help.
I'm doing research for a professor and he has asked me to find a proper statistical test for this project. Here are the details (as best I know them):
- There are two separate hedonic regressions. One from 2005-2007 and another in the post recession period, 2009-2011. Of course, we want to compare the corresponding coefficients from each regression to determine if they are statistically different from each other (i.e. is the effect of lot size on home price the same pre and post recession).
- Apparently, a standard t-test is not appropriate, according to the professor.
- He is resistant to combining the data sets and using a dummy variable for post-recession sales and creating interaction terms and testing their significance. The reason being they suspect there would be huge collinearity problems. I guess the level of multicollinearity is already quite high, which is why they would not prefer a basic t test to begin with.
So I'm looking for any methods you may know of to effectively compare these regression coefficients. If you know of any academic papers that have an example of such a test, that would be phenomenal as well.
Thank you for your time.
Best approach, merge datasets if sampled the same way, address collinearity, add variable for time period and look at interactions.
Other approaches would be compromises.
Stop cowardice, ban guns!
I've asked to do it this way, but I've been dismissed each time. This professor is so sure that someone else has done this sort of comparison already, but hours of searching suggests otherwise.
Thanks for your answer anyway. Continuing to accept alternatives.
would the professor be willing to combine the data from both time periods and run simple tests, one independent variable at a time, and build up that way? I've seen effects-testing models such as these built up that way, or at least to gain understanding, rather than dumping everything in at once ... seems this would help with the multicollinearity concern b/c you're just looking at one variable at a time. Only other ways I can think of are also 1-variable-at-a-time approaches, i.e. nonparametric trend test to see if rate of price across lot size is different in the two "groups" (time periods)
dlh8 (02-10-2014)
Try "Chow test" (search for it).we want to compare the corresponding coefficients from each regression to determine if they are statistically different from each other
But if this is what you want it is just a t-test of "beta_pre" versus "beta_post" (where the standard errors are give by the respective regression)(i.e. is the effect of lot size on home price the same pre and post recession)
The chow test is a test of a structural break. If you find none than it is reasonable to combine the data with a time series. If not than you can not.
Since you are using time series data I would think you need to do a general durbin test to test for autocorrelation (the better known durbin watson only tests for first order AC). I would be more concerned with that than multicolinearity.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Yes, but that's not what he wants to do. I guess he believes the multicollinearity is too high to make such a test reliable. I honestly do not know anymore. I was just told to find an alternative to doing t-tests.
It's not time series, though. It's a cross sectional data set. We're not observing the value of the same property over time. It's just a set of all transactions that occurred in the pre-recession period and in the post-recession period.
Still appreciate all the suggestions. I'll look in to some of it. But I'm getting less and less hopeful.
Have you actually done a multicollinearity (MC) test to see if his concerns are justified? I would start there (of course if they are not you get to figure out how to tell him he is wrong). There is no use dealing with an issue that does not in fact exist.
I assume you want to compare individual predictors not just the overall model value (the later won't be effected by MC). Even if you have MC this won't effect the calculated effect size just the test of significance. So you can compare the effect size of each slope across the two regressions even if you have high MC (although you won't know if the difference is statistically signficant).
There are methods to address MC. You might look at John Fox's "Regression Diagnostics" by Sage.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
hlsmith (02-10-2014)
Agree with noetsi on verifying MC.
Stop cowardice, ban guns!
I don't have access to the data or program he's written, so I can't really test it. So you can see my frustration here. I'm on a wild goose chase for a fix to a problem that I don't even understand the extent of. I'm losing it...
Yes, we want to compare the individual predictors. Previous literature just looks at the two coefficients and compares their effects on price. But that totally ignores the price effect (sure, something may increase price less post-recession, but that doesn't really take into consideration that prices in general decreased after the recession). That's why we need to test if the coefficients are statistically different.
I understand your frustration. If the professor wants you to do this project he really should give you access to the data. How are you planning to test the significance of the change in slope? What you would have to do is test if the change in slope in each predictor was signficant I assume. I have never seen that done with multiple predictors this way.
I think ultimately you have only one choice if you have MC and you want to test signficance of individual predictors. That is to get rid of the MC. I do not believe any statistical method inherently eliminates this (which is tied to the structure of the data itself and ultimately to the way the IV react with each other). You might look at the Fox work I suggested although solutions are never simple for MC (and some such as combining individual variables probably won't be acceptable to the professor given what you want to do).
Ultimately you can only do what is possible to do - even if your professor does not like that.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
dlh8 (02-10-2014)
I just met with the professor and convinced him to let me try to run the regression with the interaction terms to see if his concerns were justified. Hopefully I can make some progress from there.
Thanks again.
If that is the issue then a Chow test - in my view - is the way to go.
A Chow test does not have to be used in time series model. It can equally well be used in cross sectional data.
(I believe that that test can be thought of a likelihood ratio test in comparing a null hypothesis of the same parameter vector in both time periods versus a hypothesis of different parameter vectors in the two time periods.)
As I understand it multicolinearity has nothing to do with this issue.
Is the chow test used this way (which I am not familar with since I have only seen it used in time series) an omnibus test? Does it test that none of the parameters have changed (as a single calculation somewhat like a model F test in ANOVA)? Or does it test each parameter separately?
If it is an omnibus test, then unless all the parameters remain unchanged you are not going to know which did change - and MC will be an issue.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Tweet |