What is Endogeneity?

#1
Hi! The title says it all. I am a health care worker currently going through a really interesting paper. I have some experience in statistics but still quite far from an experienced biostatistician. I want to go through the methods of their analyses but I don't understand them quite well. I need some help...
They refer to the "endogeneity" (from econometrics theories) of certain variables. Then they use specific formulas to test correlations and associations between risk factors and outcomes. I can't find a simple explanation of endogeneity and its use in biostatistics on internet. All I seem to understand, and you can correct me if I am wrong, is these complicated formulas :eek: might be trying to control for unknown bias, which would make sense because their research question is pretty hard to "unbias". Anyone know how to explain endogeneity and how this can be tested on health-related risk factors in a simple fashion? Thanks!
 

hlsmith

Not a robit
#2
Might be beneficial to provide a citation to the paper, so we can see the context.


When you think of causality, the endogenous variable has a known parent or variable that comes before it in the causal network. An exogenous is the parent without a known parent. Z -> X -> Y. Trying to predict Y, with X being an endogenous variable. There are some tests to try and see if a variable or predictor is endogenous.
 

hlsmith

Not a robit
#4
Yeah if I get a chance I may review it tomorrow, but yes this would be a complex paper to non-statisticians.

Concepts: marginal structural models and instrumental variable analyses, both more modern approaches.
MSM gets at average effects and IV confounders (instruments) . I believe the z term are the endogeniety variables.
 
#5
An endogenous variable is something that "is determined within the system" (often called y1, y2, y3...)

An exogenous variable is a variable that "is determined outside of the system" (often called x1, x2, x3...)
(But the statistical description for exogeneity is that the x-variable is statistically independent of the error term (and endogeneity that it is dependent of the error term.))

Example:

y1 = b12*y2 + g10 + g11*x1 + err1 (1)
y2 = b21*y1 + g20 + g21*x2 + err1 (2)

You can notice that in equation (1) y1 is influenced by y2 and in equation (2) y2 is influenced by y1. So they are influencing each other. They are determined within the system.

Notice that err1 influences y1, and y1 influences y2 (from equation 2), so y2 in equation (1) is not statistically independent of the error term err1. If one tries to estimate each of the two equations with ordinary least square (OLS) then that will give biased and inconsistent results.

The above is called a structural model, or a structural equations model (SEM).

If you solve for the y:s so that:

y1 = d10 + d11*x1 + d12*x2 + err3 (3)
y2 = d20 + d21*x1 + d22*x2 + err4 (4)

Such a system is called a reduced system. Now the difficulty is how to, from the estimates of the reduced model (the "d:s"), get some estimates for the structural form. That is called the identification problem.

Sometimes the parameter in the structural form is using instrumental variables. That is variables that are independent of the error term but strongly correlated to the explanatory endogenous variable.

The funny thing is that in the attached file there is only one explanatory endogenous variable in one equation (like b21=0 in eqation (2)) (LOS explaines AE, but AE does not explain LOS). Such a system is a "Wold causal chain" and can be estimated with OLS. So the instrumental method estimation in in the paper is completely unnecessary, and I believe that it has caused some inefficiency. :)
 
#6
Thanks!
I guess I understand the concept of endogeneity with your previous explanation and by going back to the study. I also know what they intend to do by using instruments.
Although, I don't understand why they use days of the week as an instrument/proxy for length of stay. That's where I'm missing an important theoretical point. Days of the week influences the length of stay, although it does not influence the chances of suffering from an adverse event, as they mention. So it would make that variable exogenous to the model. But how does that instrument help you find the correlation between length of stay and adverse events in hospital? If you are using length of stay in you analyses doesn't it re-include the biases you had at the beginning? And if you use days of the week in your analyses, how can you infer length of stay from these calculations? That's where I realize I'm missing a large knowledge background!:)
 
#9
The funny thing is that in the attached file there is only one explanatory endogenous variable in one equation (like b21=0 in eqation (2)) (LOS explaines AE, but AE does not explain LOS). Such a system is a "Wold causal chain" and can be estimated with OLS. So the instrumental method estimation in in the paper is completely unnecessary, and I believe that it has caused some inefficiency. :)
I think the use of this method is supported by 2 facts: 1) there is unknown factors that influence both LOS and AE and 2) AE actually probably influences LOS also, meaning that if you get a medical complication, you will likely stay longer.

GretaGarbo: are the instruments they are using valuable? How do they get from using days of the week to correlation with LOS?
 
#12
I think the use of this method is supported by 2 facts: 1) there is unknown factors that influence both LOS and AE and 2) AE actually probably influences LOS also, meaning that if you get a medical complication, you will likely stay longer.
But should not AE also be included then in the model so that it would explicitly be an interdependent system and not a Wold causal chain?
But I only looked briefly at the paper and I only wanted to point out the possibility of the causal chain.

GretaGarbo: are the instruments they are using valuable? How do they get from using days of the week to correlation with LOS?
I would need to look at it more to say something about it.


GG, have you ever calculate local average treatment effect (LATE) ?
Nope!