Please, anyone? This should be an easy qestion to answer. Appreciate it a lot as my work with my thesis has stalled.
Yes
No
Yes, some but not all
Yes, but other things comes into play
It depends on whether you use logistic regression or linear regression
It depends on other things
I better give you an answer in text
Is it true that when one just want to control for/adjust for in linear og logistic regression models, the variables one wants to control for/adjust for does not have to meet the assumptions of these models?
For instance, if one wants to control for age, sex and education (common confounders) in a linear regression, we don’t have to worry about e.g. age being linearly related to the outcome variable or that the assumption of homoscedasticity is met. I am told that this is supposed to be so because when we control/adjust for confounders we are not necissarily interested in the estimates (beta coefficients) of these variables. Supposedly, violating these assumptions does not affect the estimates of the beta coefficients we are interested in, i.e. other than age,sex,education.
Is it true that we don't have to worry about violating any assumptions when we ust want to control/adjust for? Anybody knows? If true, are there other things we have to have in mind when just controling/adjusting for variables?Thanks a lot!
Please, anyone? This should be an easy qestion to answer. Appreciate it a lot as my work with my thesis has stalled.
Formally it is not true that the regression assumptions do not have to be met because you want them only for statistical control not analysis. Or rather I have never seen any discussion of the assumptions suggest this, the focus is on the model not indvidual variables. In some cases, like linearity, one variable violating this won't neccessarily influence another variable it is not interacting with, but I do not know if this is generally true for all assumptions. A professor of mine suggested the assumptions that all variables are quantiative (including dummy variables as that) could lead variables to take on nonsensical values even if they were quantitative.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Not sure what you're getting at here but if the point is to control for certain variables then honestly I would think that controlling for it appropriately would be a prime concern. For instance if you only include a linear term when the true relationship is quadratic - then you still haven't fully controlled for that variable.
I have no idea what this is supposed to mean.A professor of mine suggested the assumptions that all variables are quantiative (including dummy variables as that) could lead variables to take on nonsensical values even if they were quantitative.
I don't have emotions and sometimes that makes me very sad.
My first comment, the first one quoted, means that the assumptions behind regression apply to the overal model not to any specific variable in it. And so a violation of it by one variable, won't mean the whole model will violate the regression assumptions. So the fact that one variable is not linear does not mean other variables are not and can not be interpreted as linear variables. That is a violation in the assumptions by one variable does not invalidate the whole model.
The second statement you cited means that if you have (for example) 7 levels of a variable in the model (not a dummy, you have seven specific levels in a variable ) it will (or it can) make it impossible to interpret other variables in the model. Despite the fact that no assumptions are made about the distribution of IV.
It is the only time I have heard this, but the individual in question has a PHD in stats from Harvard and is clearly brillant, so I give what he said (which was not really subject to misinterpretation in the conversation it was part of) a lot of credit. We were talking about the practice in SS of having likert scale variables (not dummies, variables with 5-7 levels) in models. One professor asked, given the fact that there are no IV distribution assumptions, if any distribution of an IV could cause a problem. And that was the response the other professor gave. He said specifically it could lead to other variables having nonsensical slopes.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Thanks a lot, both of you!
This makes sense. And me makes me a bit worried to - thinking about all epidemiological studies controlling for age for instance, which often may not be lineary related to the outcome variable. Then again, in a logistic regression, which does not have an assumption of linearity, then we shouldn't worry about the distribution of the age variable. Do you agree?
Do you think the same reasoning tou made for the assumption of lineraity applies to the assumption of equal variances/homoscedasticity?
I think that quote is including some of your response.
I don't have emotions and sometimes that makes me very sad.
If you add a continuous variable in a logistic regression model you assume a linear effect in the log(odds), so logistic regression does involve a linearity assumption.
kristian (03-08-2014)
Only with the logit not the raw data, even if the IV is interval.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
OK, but to clearify, does this also apply when you just want to control for a variable?
Dason partly answered this question, saying that if the relationship is quadratic, you are probably not fully controlling for it. In other words, this kind of violation doesn't sound to influence our results in a dramatic way. But what if the realtionship between the control variable ande the outcome is positively related up to one point (for age as an exameple, say up to 65 years), and after that negatively related? It sounds do me that such a violation is much more dramatic.
Is the conlusion here that thoughtless "controlling for", as I suspect there's a lot of in the research reports published around the world, is a considerable source of bias?
Tweet |