What does your model look like (how many and what kind of
variables are in the model), and how large is your sample size?
With kind regards
K.
Dear all,
Since the R² value of my regression model is one, I can neither conduct a t-test nor a F-test. Is this a problem with regard to hypotheses testing? I am asking since I cannot "proof" via the above-mentioned tests that the coefficients are statistically significant...
Many thanks for taking the time to post an answer!
Regards
Yosi
What does your model look like (how many and what kind of
variables are in the model), and how large is your sample size?
With kind regards
K.
I have never seen this before, but could imagine that it could exist under the right circumstances.
Is one of your predictors just a proxy or another form of the dependent variable?
Stop cowardice, ban guns!
My model looks like this:
Y=b1*x1+b2*x2+b3*x3
The values of Y are defined as the sum of the values of the predictor variables, so it's no surprise that R² equals 1. With the help of the regression analysis I tried to identify the predictor variable with the strongest influence on Y.
The sample size is n=15000.
Wouldn't the largest variables contributing to the sum be the strongest predictor? Can you provide a context, so we can understand why you are doing this?
Can you calculated some type of average contribution based on proportion of contribution, then test that the proportions are not equal?
Stop cowardice, ban guns!
To give an example (3 out of 15000 observations):
Y; x1; x2; x3
10; 5; 3; 2
12; 2; 2; 8
15; 5; 1; 9
The question is: Which predictor is the strongest/weakest one if all obervations are taken into account? I think that a regression analysis is a suitable tool for addressing this question. Please correct me if I am wrong.
Additional remark: Of course it would be necessary to compare the standardized coefficients since the unstandardized coefficients are all equal to one.
Last edited by yosi; 04-10-2014 at 06:47 AM.
hi,
how about dividing the three variables by Y to get the percentage contributions and doing an ANOVA, or an appropriate non-parametric test? This does not look like a regression problem to me.
regards
rogojel
By strongest influence you just mean the greatest contribution, correct. As stated before why can't you just determine their average proportional contribution (percentage)?
Stop cowardice, ban guns!
I would imagine Y is a perfect linear combination of your X in this case which is why your model won't run. I am not sure why you would be interested in testing a model where Y is a combination of the X by definition. You know the X have to drive the Y and nothing else but these specific X influence it - you have defined it that way. So there is no point in testing this.
You could probably simulate changing the value of one X while holding each other X constant. Then find out from this simulation which of the variables has the greatest influence.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
noetsi is absolutely right. Regression is used only when the function relating Y to X's is not known. In your case, you know the exact contribution of each predictor X. To compare the contributions of predictors X1 and X2 you can compare Corr(Y,X1)^2 and Corr(Y,X2)^2. This approach compares the amount of variation in Y explained by X1 to that explained by X2.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
I think noesti may be trying to say that breaking it apart this way may not address possible collinearity between predictors. It may not seem like it, but two of the covariates may move together. E.g., say when A is bigger so is B, while C is smaller. We don't know the content of your problem, so we only present plausible theoretical issues.
Stop cowardice, ban guns!
Yes (I guess it would be multicolinearity right)?I think noesti may be trying to say that breaking it apart this way may not address possible collinearity between predictors.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Thanks for this vivid discussion, I really appreciate your contributions. I understand that there are alternative - maybe even better - ways to identify the contribution of each predictor. But nevertheless I still think that a regression analysis is not necessarily the wrong tool.
"I would imagine Y is a perfect linear combination of your X in this case which is why your model won't run."
I know that this is most unusual, but why is it a (statistical) problem? R² = 1 is an accepted value.
"Regression is used only when the function relating Y to X's is not known."
Only then? Why? I am interested in the amount of contribution of the X's - considering all observations. Why is a comparison of the standardized coefficients a wrong approach?
hi,
just my five cents:
your equation is precisely Y=x1+x2+x3 with all coefficients being 1 and no random term. The least squares is just not the right mathematical model, imho. From the equation point of view all terms have the same contribution.
Now, in reality it could happen that x1 is generally higher then x3, for instance. This would be a simple ANOVA type of question and has nothing to do with the fact that the 3 xs are bound together in such an equation.
regards
rogojel
yosi (04-11-2014)
Tweet |