# Unobserved heterogeneity in logistic regression

#### Jake

Pasted from the TS chatbox:

Jake: i'm working on a blog post about the whole unobserved-heteogeneity-in-logistic-regression thing
Jake: it's like 1/3rd written
spunky: I was curious of what was going on with that blog. I'm writing one post myself about a little surprising result within linear algebra in euclidean distances that serves as a warning of why it's difficult for people to think multivariately.
spunky: When you talk about unobserved heterogeneity are you referring to measurement error? Or the missing variable problem?
spunky: Or both?
Jake: about my blog post: the latter. sometimes this is phrase as an omitted variable problem and sometimes phrased as a latent variable heterogeneity issue
Jake: not measurement error per se, although you could perhaps consider the latent variable interpretation to be related to measurement error... sort of... maybe....
spunky: My question has always been whether we can tell apart the two (i.e. when is it measurement error and when is it an omitted variable). But you have a better grasp of DAGs than I do so maybe you can help us tell them apart?!
spunky: Well, most errors-in-variables models can be re-paramterized as SEMs as long as the error is additive I think.
Jake: with only a single binary outcome there's no way to tell anything without adding further assumptions/constraints
Jake: with multiple binary items you could do some fancier modeling that would help. i mean it still requires bringing in some assumptions but they're not crazy
spunky: Wait... so you're talking about measurement error in the DV? The 0s and 1s?
Jake: yes. not the predictors. that's old news, i already published a paper about it and everything
Jake: but, i mean, NO, you said measurement error, not me!
Jake: i don't think that's the right way to think about the issue i'm writing about
spunky: Unobserved heterogeneity in 0s and 1s only? Omg... that sounds kinda crazy to be honest. I don't think I even have a good grasp about how to conceptualize it. I thought you were talking about the predictors like we all do!
Jake: the issue i'm writing about is the one discussed by Allison (1997) and Mood (2010) which claims that logistic regression has big problems related to unobserved heterogeneity
Jake: it's like a counter-point basically
spunky: So walk me through this. How does unobserved heterogeneity manifests itself in discrete outcomes? Like some sort of missclassification rate?
Jake: i think there are some atypical cases where their arguments hold and are relevant, but not most of the time
Jake: well there are two ways to think about it.
spunky: Measurement error and missing variables, I'm sure
Jake: in the first way of thinking about it, we suppose that the binary variable is the result of an underlying, unobserved continuous variable (with a normal or logistic distribution, corrsponding to probit or logit regression respectively) that was subjected to a thresholding process
Jake: let Y by the observed binary outcome and Y* be the latent continuous variable
spunky: Ok, I'm familiar with that. That's the typical setting for ordinal SEM
Jake: so then the problem is if we observe group differences in Pr(Y=1), we can't automatically assume it's because the mean of Y* shifted up for one group relative to the other. it could be due to difference in var(Y*). or even to difference in where the threshold is
Jake: which is all fine and good, but it all hinges crucially on believing that an underlying continuous Y* makes sense AND that it's really Y* that we care about, not Y
spunky: Totally makes sense and I have a hard time believing we ever care all that much about Y*. Heck, the whole field of IRT is based on that and sometimes even I question it.
Jake: which is sometimes true, but not usually. more often, when we're using logistic regression on some Y, we just want to know what X's render Y=1 to be more likely. and we could give a **** about any underlying Y*
Jake: plus, in the subset of cases where we DO care about Y* rather than Y, usually in those cases researchers will have taken care to collect more than one indicator of Y*
Jake: in which case you can apply fancier models that more or less solve the problem
Jake: so this whole interpretation is only ever a problem if we care about Y*, not Y, and we only have a single indicator. which, you know, i'm sure happens sometimes. but not that much i think
Jake: okay so that first interpretation is pretty easy to handle i think
spunky: So case 1 is out. Unobserved heterogeneity isn't really a thing under the latent model for logistic regression and I totally agree with you. What's case #2?
Jake: the second interpretation is a bit trickier
Jake: it turns out that you can arrive at pretty much the same place without ever invoking a latent continuous Y*
Jake: instead you just suppose that there are unobserved/omitted X's that are associated with Y -- but that DO NOT need to be correlated with the other X's
Jake: they can be totally orthogonal to all the observed X's -- so they're not technically confounders -- nevertheless, unlike in OLS regression, the presence of these "competing exposures" as they're sometimes called will alter the betas for the observed X's that you care about
spunky: Uhm... OKish. The "not correlated" part may be a little bit difficult to believe though. But ok, I can live with that.
Jake: a phenomenon known as non-collapsibility
Jake: so an intuitive way to describe the problem this creates is like this
Jake: say we have an effect of X on Y that is different for men vs. women, which is a covariate Z, uncorrelated with X
spunky: Ok...
Jake: but there's no interaction. among both men and women, the effect of X on Y (i.e., the logistic regression beta) is, say, 2
spunky: Ok, I follow..
Jake: now if we fit a model where we omit Z from the regression equation, the beta for X in that smaller model will generally be attenuated toward 0. it might be, say, beta=1
Jake: but if beta=2 for every member of the population -- that is, for both men and women -- then what does this beta=1 even refer to?
Jake: it's not an average effect across men and women, because any average would be 2
spunky: Uhm..

Last edited:

#### Jake

spunky: And this only works when you have like a continuous covariate and a categorical predictor?
spunky: Or is it irrespective of the type of data?
Jake: so now we can imagine that for any logistic regression equation that we estimate, there's almost certainly lots of variables we omitted that affect Y, even if they're not confounders (not associated with the X's). and we can't interpret the beta we estimated as being any sort of average across those unobserved covariates. so how can we say that the beta we estimated has any meaningful interpretation?
Jake: the covariates and predictors can all be continuous or categorical, doesn't matter
Jake: i don't agree with this line of reasoning BTW, just trying to lay it out in a way that makes it sound appealing, that is, the way it's been explained as a problem in the past
spunky: Is the assumption of this covariate being uncorrelated with the other predictors and only with Y necessary?
Jake: when you describe it this way, it does kind of sound like a paradox of some kind, right?
spunky: Yeah. Very Simpsons-paradox like I must say
spunky: I'm just wondering how common it actually is?
spunky: Or it even looks like suppression in OLS regression.
Jake: it's not that it's necessary, it's that obviously things get even worse if the omitted variables are correlated with the observed X's. because then they're confounders too!
spunky: Actually, it looks A LOT like suppression.
Jake: the point is to say that even in the best possible case, where we have no confounders, it's still (allegedly) a problem
spunky: But you argue this... isn't?
spunky: I mean, omitted variables seem to be as common as the air we breathe in data analysis
Jake: right. i mean, it's a behavior that we should be aware of. but it's not a "problem"
spunky: Why you wouldn't consider it a "problem"?
Jake: the mistake, i argue (and others have argued recently), lies in viewing the beta from the model that omits Z as somehow estimates the same thing as the beta from the model that includes Z. so that if the former beta doesn't match the latter beta, it's "biased" in some way
spunky: OH. I see. So it's more of a "model-dependent" type inference?
spunky: Like "If I only care about making inferences about X, what's the problem if Z is missing"?
Jake: the resolution is to say that these betas estimate two totally different effects, and there's no reason in general to expect beta_small to be close to beta_large and that is just fine. specifically, this is logistic regression right? so basically we are estimating Pr(Y | predictors). in the small model we are looking at Pr(Y | X), while in the large model we are looking at Pr(Y | X, Z). we know from probability theory that these are different and generally unequal things
Jake: it happens to turn out that in OLS, we the regression coefficients have this *collapsbility* property so that the beta in the small model equals the beta in the large model as long as the omitted variables are not confounders. but that is a very special property of OLS and not true in general
Jake: so people assume that that's how things *ought* to be and, if it's not true, that's a *problem*
spunky: My guess is that people would call this a "problem" because we live in a world where you're expected to adjust for everything under the sun and more and only then can you claim to have evidence for any of the effects you argue for?
Jake: but there is no absolutely reason to suppose that that should be true in general, and this, no reason to consider beta_small to be somehow just possibly biased estimate of beta_large
Jake: i really just think it's based on a naive assumption that the way OLS works is the way everything ought to work
spunky: But you just said even for OLS if you omit a predictor the regression coefficients would change if the model changes. Which is kind of a *duh* realization but apparently when you move the same insight into logistic regression people call it something else?
spunky: I mean, when was the last time in OLS regression that you changed the model and the coefficients stayed the same?
Jake: no, for OLS, omitted a predictor only changes the slope (in the population) *if the omitted variable is a confounder*
spunky: Now, just so I don't get lost here... "confounder" is defined as some X that is correlated with the other Xs of interest, rite?
Jake: right. i think it's best defined in causal terms, but for our purposes here, it's fine for now to say it's a confounder if it's correlated with both Y and X
Jake: so going back to the omitted male vs. female indicator. we can say that the small model with beta=1 tells us about Pr(Y | X) -- which is *not* in general the same as Pr(Y | X, Z) for any particular value of Z.
spunky: I guess I've just spent so much time around correlation matrices that my immediate reaction is to say everything is a confounder because almost everything is correlated with everything else in some way. But I can see the point that you're making now. In OLS regression this is not a problem if the confounder X is not correlated to the other Xs. In logistic regression it is a problem even if it is not correlated and somehow people don't like that.
spunky: But what I'm interpreting here (and please correct me if I'm wrong) is something along the lines of "people are losing their **** because when they change the model they expect the estimates to remain the same...
spunky: ... even if the model isn't...
Jake: sort of. they're losing their minds because they realize that even under idealized conditions (no confounders) the estimates will generally change when you change the model
Jake: not just that this change happens "in the real world"
Jake: now, an interesting and kind of intuitive behavior here is that these omitted variables generally cause attenuation of the betas toward 0. and as you include more and more of those variables, betas approach +/- infinity, i.e., the conditional probabilities approach 1 or 0. which is like saying, the more features we learn about the observations, the more and more certain we get about whether we expect Y to be 1 or 0. which makes sense
spunky: Uhm. Yeah, I can see that.
Jake: as i condition on more and more informative predictors, i become pretty **** sure for each case
Jake: so, i mean, the behavior is surprising for people who are used to working with OLS, but it's not like it's crazy, it does make intuitive sense from other perspectives
Jake: it would be weird and conditioning on more and more predictors, all of which really do add information, somehow DIDN'T push the conditional probabilities toward 1 or 0
Jake: oh yeah, and one reason why this matters is because some people use this line of reasoning to justfying using the (ugh) "linear probability model" for binary outcomes
spunky: Dang it! We'll need to catch up later then cuz I do like the LPM

#### hlsmith

##### Omega Contributor
Will you use DAG similar to those in Miguel Hernan's MOOC related to error dependence?

#### Jake

Thanks, yes, this is an excerpt from Pearl's book, which I own and have read. I've definitely re-read that section several times while working on my blog post.

#### noetsi

##### Fortran must die
I never got a sense in Allison that this is only an issue when you assume a latent variable behind the observed binary. You don't have to assume this latent variable even exists (I never do) and you can still conduct logistic regression.

#### Jake

Yes, I agree. We discussed that in the chatbox starting with the following two chats:
Jake: the second interpretation is a bit trickier
Jake: it turns out that you can arrive at pretty much the same place without ever invoking a latent continuous Y*

#### spunky

##### Doesn't actually exist
I feel important to mention, however, that we should always prefer a latent variable approach as opposed to no latent variable approach.

Remember spunky's razor:

entia sunt multiplicanda praeter necessitatem

#### noetsi

##### Fortran must die
I feel important to mention, however, that we should always prefer a latent variable approach as opposed to no latent variable approach.

Remember spunky's razor:

entia sunt multiplicanda praeter necessitatem
Why. It may make no substantive sense to assume such a variable even exist.

#### spunky

##### Doesn't actually exist
Why. It may make no substantive sense to assume such a variable even exist.
It's a joke, @noetsi . If you look up spunky's razor you can see I cleverly edited the original statement from Occam's razor to highlight the irony.

It's also part of a wider criticism I have when people use statistical analyses to make pseudo-causal claims about constructs in social science (education and psychology in particular because those are my areas). There's this thing that Denny Borsboom accurately refers to as the "all encompassing black-hole known as construct validity" because in my (actually, I should say *our* because you also come from an Education background) there's this weird assumption that construct = latent variable, which is blatantly wrong. But you hear about it over and over again.

I'm also starting to get a little bit sick of the whole "hidden moderator" argument but I guess that will become my pet peeve for this year.

#### noetsi

##### Fortran must die
I actually come from political science/ public administration. I got an applied statistics degree in education because they were the only program that had such

There are lots of problems with statistics in the social sciences for sure. One of which is different authors using different words to refer to the same exact technique - something that continually confuses me.

#### hlsmith

##### Omega Contributor
@spunky I hadn't heard that term, but I am intrigue by the idea you conduct a study in LA and try to transport results to NY, but NY has a hidden moderator that wasn't present in LA. Can you ever transport results to a new population is the issue.