Confused about causality

#1
I'm helping a colleague of mine with his homework for his political science master. He's investigating the effect of political orientation on support for military missions. He has a big database with survey data. Political orientation (left-right) is measured on a 10-point scale, support for military missions on a 7-point scale. We've established a correlation between right-wing stance and support.

Part of his assignment is to look for a confounding variable. He suggested gender. Women are generally more left-wing. Perhaps they are also less likely to support war (more pacifistic nature).

The problem is that I have a hard time wrapping my head around the causal interpretation. One option would be:

G -> P -> S (Gender -> Political stance -> Support)

Where the effect of gender is completely mediated by political stance.

The other end of the spectrum would be:

G->P
G->S

Where none of the correlation between P and S is actually causal and the correlation is spurious. In reality it's probably somewhere in between.

Is there any meaningful way to say anything about this using statistics, for this non-experimental design? Does it even make sense to take this as a confounding variable? The idea is that he should use a MRA with a control variable (gender in this case).
 

noetsi

No cake for spunky
#2
I am not at all sure, as a former political science type, that women can be described as more left wing. That likely depends heavily by country and by the dimension of conservatism/liberalism you are measuring although it is more plausible to me in national security areas (although that is purely an opinion as I have no polls recently). Further these phenomenon change a great deal over time. In the US for example support by the left for military missions was probably higher than the right in the early sixties (compare Eisenhower's final presidential speach with Kennedy's inaugaral speach).

I don't really understand the differences you are describing under causality. But in general you can never show causality through statistics including correlation. Any result you find, however strong, could be spurious. Causation can only be established in theory not statistical models. The best you can do is make your arguement as plausible as you can by adding variables likely tied to a phenomenon and by special techniques like SEM which allows you to better see indirect effects (which is really what you are asking I think - how one variable influences the impact of another variable on a third variable).

SEM is not the simplest of techniques however (well it wasn't to me anyhow).:p
 
#3
Thanks for your reply. Basically what I want to do is distinguish between mediation and spurious correlation. I want to know how much the effect of gender on military missions is mediated through a 'left/right' variable.

I guess this approach might work? http://en.wikipedia.org/wiki/Mediator_variable (see Baron and Kenny's). But elsewhere I read that you need an experimental design to do a mediation analysis.

Using that approach there is some mediation, but it's rather limited. So both variables (gender and political orientation) seem to act relatively independently.

I'm not sure about the political science stuff, to be honest. It's not my area and I also understand from him that this is more an exercise for his stats assignment. Interesting though. In any case within this data case women are indeed both more left-wing and less likely to support military missions.
 

noetsi

No cake for spunky
#4
I don't believe you can statistically determine what is a spurious relationship and what is mediation (or indirect effects) in any absolute sense. Your design might be able to do so, but never the statistics itself. Methods make assumptions, for example in regression (leaving aside the issue of interaction) the independent variables are assumed to be independent of each other in their impact on the dependent variable. Whether that is empirically true is another issue.
 

Lazar

Phineas Packard
#5
You have at least two paths in the above model for which the causality is clear G -> P and G -> S. Your problem completely revolves around P and S . There is a model for which one could test for causal ordering (an autoregressive cross-lag model) that will allow you to consider the evidence available for P->S and S->P or even the reciprocal relationship between the two. Such models can test for two of the assumptions of causality (P and S are related and there is some notion of temporal precedents), however these models cannot resolve third variable problems.
 

CB

Super Moderator
#7
One option would be:

G -> P -> S (Gender -> Political stance -> Support)

Where the effect of gender is completely mediated by political stance.
If this was the case then I'm not sure you could really call gender a confound. Statistically controlling for it shouldn't affect the results (i.e. you should see a similar effect of political stance either way, because gender has no effect on support for military action independent of its effect on political stance).

The other end of the spectrum would be:

G->P
G->S

Where none of the correlation between P and S is actually causal and the correlation is spurious. In reality it's probably somewhere in between.

Is there any meaningful way to say anything about this using statistics, for this non-experimental design? Does it even make sense to take this as a confounding variable? The idea is that he should use a MRA with a control variable (gender in this case).
If the above scenario was the case, then multiple regression might allow you to identify this. I.e. perhaps political stance no longer predicts support after controlling for the effect of gender. Or perhaps they both have some independent effect on support for military missions.

Of course, the regression analysis would give you unbiased estimates of the independent effects of political stance and gender under quite strict assumptions. E.g., if there are other potential confounds you're excluding, or you have reciprocal causality happening (e.g., support for military missions has a causal effect on political stance), then the regression estimates will not be unbiased estimates of the causal effects you're interested in. The models Lazar mentions could be helpful with more complex issues like that.
 

noetsi

No cake for spunky
#8
I was always taught that no statistical method could ever determine causality (I lost track of how many times I have read that in text). :p It appears that views on this are changing.
 

CB

Super Moderator
#10
I was always taught that no statistical method could ever determine causality (I lost track of how many times I have read that in text). :p It appears that views on this are changing.
I don't think this is wrong, there are just subtleties. The way I think of it is that when we're trying to demonstrate that A causes B we need to:
1) Show that A and B are correlated (stats can show this, even using a correlational design, obviously)
2) Show that B doesn't cause A, most likely by showing temporal precedence of A before B (stats can show this without experimentation, but you probably need some kind of longitudinal data collection)
3) Show that there isn't a third variable or confound that explains the correlation between A and B (statistical methods alone can't rule out all confounds, but they can rule out potential confounds that you measure and include in your model)

A true experiment can show that all three conditions are met in one fell swoop, while statistical evidence from non-experimental designs can't. But evidence from non-experimental designs can still allow you to slowly build evidence in favour of causality.
 

Lazar

Phineas Packard
#11
Just quickly to add a) that experimental design do not show individual causal effects but only average causal effects and b) rests on assumptions that random assignment provides balance on all covariates and this may or may not be the case for any given experiment c) that their are no hidden treatment variations or diffusion. Thus while experimental design is a gold standard it is not foolproof.

In addition there has been considerable advances in non-experimental design that can go quite some way to meeting condition 3 that CB states; regression of discontinuity, instrumental variables, propensity score matching, and regression adjustment.
 
#12
Another thing to consider is that experiments in social sciences are rarely done with a representative sample (random or otherwise) of the population (usually the population is not even clearly defined). Most experiments are done using Western college students between the ages of around 18-25. Since mostly one is interested in differences rather than absolute values, this does not always need to be a problem. But in many cases it might. There's a famous example with the well known Müller-Lyer illusion, but there are many more. There's an article by Henrich, Heine and Norenzayan (2010, full pdf here) that has many more. Subjects are typically from WEIRD societies: Western, Educated, Industrialized, Rich and Democratic (and I would add Young to that).

Not that this has anything to do with causality per se, but it's another reason to be wary with drawing supposedly universal, "human nature" conclusions based on experiments in social sciences.
 
Last edited:

noetsi

No cake for spunky
#13
There are two quasi-experimental designs that are, in my opinion, pretty close to experimental in terms of establishing causality. First are interupted time series with control groups (making the two units similar other than the intervention however is very difficult in practice) and regression discontinuity designs (especially the newer forms that address the issue of non-linearity). But neither of these (and especially RDD) is done much.

The fundamental problem with random assignment is that you are likely to have people drop out after assignment, but before the project is done. Particularly if they drop out of different groups at different rates this will signficantly harm the results of the analysis. Additionally, generalizability is a significant issue for many random assignments because the volunteers do not represent the true population very well.
 
Last edited: