# Thread: Assumptions of logistic regression

1. ## Assumptions of logistic regression

Anyone know a good authoritative source for the assumptions of logistic regression? They differ obviously from OLS and I have not found a good source for this (some suggest that there are virtually no assumptions other than independence of observation).

2. ## Re: Assumptions of logistic regression

Here is an example of what I mean (if I understand this correctly it means homoskedacity is not an issue for logistic regression).

As mentioned previously, the independent or predictor variables in logistic regression can take any form. That is, logistic regression makes no assumption about the distribution of the independent variables. They do not have to be normally distributed, linearly related or of equal variance within each group.The relationship between the predictor and response variables is not a linear function in logistic regression, instead, the logistic regression function is used, which is the logit transformation of q:
Actually of course homoskedacity won't be an assumption D'oh

Hey I don't have to worry about violating assumptions (other than MC) with this....I like it

3. ## Re: Assumptions of logistic regression

While you don't have to worry about violating assumptions, you still face the same conflict of OLS: Does the model actually describe the phenomena under investigation. OLS makes assumptions about the structure of that relationship that we try to match to the 'reality' our data describes. It doesn't always work, so we search for transformations and remedial ways to get it to fit appropriately. Nonlinear regression (including logistic) gives you a lot more leeway in describing the structure of that relationship, but you are put more into a position of ignorance about what form that should take, what parameters should go into the model, etc. The lack of assumptions can be liberating or scary depending on your approach, experience, and knowledge of the data!

4. ## The Following User Says Thank You to bryangoodrich For This Useful Post:

noetsi (11-23-2011)

5. ## Re: Assumptions of logistic regression

I haven't used this text but I've seen it cited enough times:

Categorical Data Analysis (Agresti, 2002)

6. ## The Following User Says Thank You to Jake For This Useful Post:

noetsi (11-23-2011)

7. ## Re: Assumptions of logistic regression

Logistic regression is just a specific type of Generalized Linear Model. Learning about GLMs in general will further your understanding of whats going on but there are some issues unique to logistic regression.

8. ## The Following User Says Thank You to Dason For This Useful Post:

noetsi (11-23-2011)

9. ## Re: Assumptions of logistic regression

Discovering Statistics Using SPSS 3rd edition by Andy Field

P273 quotes 3 assumptions of logistic regression

1) Linearity
2) Independence of errors
3) Multicollinearity or rather non multicollinearity of your data

10. ## The Following User Says Thank You to Ventures For This Useful Post:

noetsi (11-23-2011)

11. ## Re: Assumptions of logistic regression

In logistic regression, if you have a continuous predictor the assumption is a liner relationship between logit and the continuous predictor variable. Another assumption is the outcome can in fact be modelled with binomial/multinomial distribution.

12. ## The Following User Says Thank You to d21e7x11 For This Useful Post:

noetsi (11-23-2011)

13. ## Re: Assumptions of logistic regression

I find the binomial assumption a bit puzzling because (according to what I have seen in some text) if your observations are independent this assumption is always met. If you don't have independence you have other, major problems

Does anyone know if autocorrelation is as much a violation of logistic regression as it is of OLS?

Thanks all for the comments. It really helps.

14. ## Re: Assumptions of logistic regression

Originally Posted by noetsi
I find the binomial assumption a bit puzzling because (according to what I have seen in some text) if your observations are independent this assumption is always met. If you don't have independence you have other, major problems
Not necessarily. You can still have independence and the assumption that a binomial is the right distribution could be wrong. I have a situation in mind that isn't even too implausible that wouldn't work with the assumptions. But then again in my situation if what we're actually looking at is bernoulli outcomes instead of binomial then it really doesn't matter and it collapses to the same thing.

The problem I'm thinking of is that in the logistic regression case if we're working with actual binomial data then given the levels of the covariates we expect the outcome to follow a binomial distribution with success probability of . But it's plausible that we might see some overdispersion. It might be overly optimistic to believe that given the covariates that the success probability is exactly the same for any observation we might make with those covariate levels. One way to fix this might be to assume that the outcome follows a beta-binomial distribution where the mean of that beta-binomial is what we do the regression on. This allows added variability. The problem gets a little bit more difficult if you do this but it might do a better job capturing what's going on.

Although in practice one might just fit a mixed GLM instead with some normal random effects...

15. ## The Following User Says Thank You to Dason For This Useful Post:

d21e7x11 (11-25-2011)

16. ## Re: Assumptions of logistic regression

I suspect that commerical software does not even address beta-binomial distributions (which I have never heard of in honesty). While the binominal assumption may not always be met, in practice there is no way (as far as I know) to test for this. I think you probably need to be more concerned that independence is met (which there is no real test for and which I imagine gets violated in real world data particularly surveys tied to geographic regions which are common).

Another assumption, I imagine, not really discussed so far is that you measure things without error - a common regression assumption that to me is absurd. It is why I like SEM, where you don't make this assumption.

17. ## Re: Assumptions of logistic regression

Originally Posted by noetsi
I suspect that commerical software does not even address beta-binomial distributions (which I have never heard of in honesty).
Probably not! Which is why I added on the random effects at the end there.
While the binominal assumption may not always be met, in practice there is no way (as far as I know) to test for this.
This is why I mentioned that if we truly have binomial data - not bernoulli - then we can look at the binomial assumption. Like I said if we only have bernoulli data then it really doesn't matter and I think things just collapse down to what we want anyways.
Another assumption, I imagine, not really discussed so far is that you measure things without error - a common regression assumption that to me is absurd.
I don't necessarily think of it that way. Sure we might not measure things perfectly (I'm assuming we're measuring the response perfectly...) but for the covariates it doesn't bother me. We want to regress based on what we actually can observe. So instead of regressing on X we're regressing on "observed X" which to me is fine because that's what we're actually going to use in practice.

18. ## Re: Assumptions of logistic regression

According to my text (which have painful way of being wrong...) not measuring something without error has signficant implications. It attenuates the observed paramaters relative to the true ones.

Of course (like say William Berry an expert in regression who wrote monographs on the topic) I doubt we can ever known the true paramater. So we agree that this concern is probably overblown in practice.

It does raise the point, like independence, of why you make assumptions you can never test and almost certainly are not correct.

19. ## Re: Assumptions of logistic regression

Hi,

Originally Posted by d21e7x11
In logistic regression, if you have a continuous predictor the assumption is a liner relationship between logit and the continuous predictor variable. Another assumption is the outcome can in fact be modelled with binomial/multinomial distribution.
How do my assumptions change if I have categorical predictors (e.g. different sites of data collection)? I'm pretty sure my data could never fit a linear relationship here, but do they have to? Or are there then other assumptions that have to be met?
(Noetsi I wish I could also refer to ur suggested books for help, I can't find them though =( )

cheers,
Katharina

20. ## Re: Assumptions of logistic regression

Katharina, you can think about assumtions of logistic regression in the same way as assumptions of linear regression (more precisely, general linear model) but now the outcome is logit of the probability of "positive" response.
So for categorical predictors, both in linear and logistic regression, there can not be the assumption of linearity due to the very nature of categorical predictors :-)

21. ## Re: Assumptions of logistic regression

Originally Posted by d21e7x11
there can not be the assumption of linearity due to the very nature of categorical predictors :-)
Well there can be that assumption. It's just relatively easy to meet since you're only assuming that a line can fit two points (which is always true). So linearity for categorical stuff isn't bad.