Interaction effect in a logistic regression between cont and cat variables.

#1
Hi guys,

I am conducting a logistic regression on whether a participants test positive or not to HIV (1=positive; 0 Negative) in a sample of transgender people. I have two independent variables Sw sex workers (yes or not) and number of sex partners (NsexP).

When I run the logistic regression on HIV status and sex worker (SW) I obtain a significant p value. Sex workers have twice the odds to be HIV positive than non-sex workers. When I include the number of sex partners variable into the regression the Odds of SW increase considerably to OR 4.0.

How can I interpret this increase in the OR in the SW variable. I was wonder if there could be an interaction effect or suppressor effect between SW and NsexP

thank in advance!
MArvin
 

hlsmith

Omega Contributor
#2
You can introduce a new variable to your model (SW times number of partners). You would keep both SW and # of partners in the model, but also add a variable that is their product. If this new variable is a significant predictor, then you have interaction.
 
#4
The interaction effect SW*NsexP is not significant. the OR of Sex worker is 2.4 alone but when I introduce number of sex partner increase to 4.8. And the effect of number of sex partners is not significant >0.05; OR 0.97. Do you have any idea of what would be going on here??
 

noetsi

Fortran must die
#5
To me, given the lack of interaction, this suggest that the new variable is playing a moderating role and that Sex Worker has an indirect effect on HIV through Sex partner. Logistic regression does not, I believe, model indirect effects. To test this you would have to run something like structural equation models and determine what the indirect effect is (assuming there is one).

I know little about moderators in regression, so there may be a simpler solution than that.
 

Karabiner

TS Contributor
#6
If a predictor is not related to the dependent variable but is substantially
correlated with another predictor, the prediction can markedly improve.
At least in linear regression models. The term for this is suppressor effect.

With kind regards

K.
 

noetsi

Fortran must die
#8
If a predictor is not related to the dependent variable but is substantially
correlated with another predictor, the prediction can markedly improve.
At least in linear regression models. The term for this is suppressor effect.

With kind regards

K.
Do you mean markedly improve when its added to the model (the suppressor variable) or removed from the model?
 
#9
My thought on this is that if you had a priori reason to believe that the number of sex partners influences the chance of testing positive for HIV, you should keep it in the model, even if its not significant. I know people that would drop it, but I prefer using your knowledge of the system to guide statistics more often than the other way around. With sex partners in the model, the significance of the sex worker effect is over and above whatever can be explained by the number of partners, like type 3 sums of squares in ANOVA (although I am not sure if it's exactly the same in logistic regression based on maximum likelihoods). You can think of the OR of 4.8 as the effect after the number partners effect has been removed. Although it can depend on the specifics of your study, I would see 4.8 as better estimate because you accounted for the other predictor.
 
#10
So Number of sexual partners plays a suppressor/moderating role in the interaction between SW (sexual worker) and HIV? I just would like to understand the logic if this regression to be able to explain it to my supervisor and to a not statistical experience audience.

1. So when I regress HIV (DV), SW(IV) controlling for Number of sexual Partners (IV), the effect of SW means that I am compering Sex workers vs not sex workers with the same number of sexual partners right? For example, I am comparing a sexual workers vs a non-sexual workers with 20 sexual partners each so the effect of sexual partners is canceled/ controlled; is this statement correct?
2. Seeing this from another perspective, number of sexual partners among this community does not influence participants likelihood to have HIV when controlling for sex workers status. How can I interpret this? Perhaps, This means the numbers of sexual partners a participant has not predict HIV status if he/she is a sex worker? This will be the same for participants who are not sex workers?

Thank you in advance. I would like to make a case with this information and perhaps write a manuscript.

Best,
Marvin
 
#12
So Number of sexual partners plays a suppressor/moderating role in the interaction between SW (sexual worker) and HIV? I just would like to understand the logic if this regression to be able to explain it to my supervisor and to a not statistical experience audience.

1. So when I regress HIV (DV), SW(IV) controlling for Number of sexual Partners (IV), the effect of SW means that I am compering Sex workers vs not sex workers with the same number of sexual partners right? For example, I am comparing a sexual workers vs a non-sexual workers with 20 sexual partners each so the effect of sexual partners is canceled/ controlled; is this statement correct?
2. Seeing this from another perspective, number of sexual partners among this community does not influence participants likelihood to have HIV when controlling for sex workers status. How can I interpret this? Perhaps, This means the numbers of sexual partners a participant has not predict HIV status if he/she is a sex worker? This will be the same for participants who are not sex workers?

Thank you in advance. I would like to make a case with this information and perhaps write a manuscript.

Best,
Marvin
 
#13
1. Yes. To illustrate this to yourself, try this: do a simple linear regression (not logistic) of hiv onto # sex partners and output the residuals. Then do a second simple linear regression (actually ANOVA, again not logistic) of the residuals onto sex worker status. What you will see in this second analysis is the effect of sex worker on HIV status after you have removed all the variation you can due to the linear effect of # sex partners. The output of this analysis will not be technically valid because the binary response variable will prevent you from meeting the assumptions of general linear models. But in many cases, the results will be similar to the logistic regression results and might be easier to wrap your head around. This is effectively what a "type3" or "partial" sums of squares tells you: the effect of X1 on Y after the effects of X2 have been controlled for. In your case, I believe it's like comparing sex worker status among individuals that all have the mean number of sex partners.
2. " number of sexual partners among this community does not influence participants likelihood to have HIV when controlling for sex workers status"<--yes, that's my interpretation too. But how to fully interpret this beyond the statistical output, that is, in the context of the sociological question you are asking? That is a great question to discuss with your PI/supervisor. I find that those conversations that start from the statistical analysis output and attempt to answer "what does this really mean?" are some of the most exciting parts of research. For me, they have been a great way for me to learn from researchers that have much more experience than I have in my own field. So keep in mind that some questions are not helped by more statistical tool use as much as from knowledge of your research system/field.
 

noetsi

Fortran must die
#14
Is centering strongly recommended when adding interaction terms? I had no heard that.

As I understand it centering changes the definition of the intercept which is obviously important if you work with categorical variables (as I normally do).
 

noetsi

Fortran must die
#15
It sounds to me that it would be simpler to run a VIF or tolerance test and if they don't show MC not to worry about centering.

Here is a different question on interaction. Say you have the following
Y=B0 +B1X1 + B2X2 +B3X3 + B4X1X2...

And this was the correct model, that is variable 3 was not involved in an interaction. When you analyze the effects of B1 at specific levels of B2 (simple effects) do you still have to analyze it at specific levels of B3 -as with a three way interaction or not?

I don't think so, but I ran across a reference that made me think you might.
 
#16
Is centering strongly recommended when adding interaction terms?
Yes yes. :)

I had no heard that.
Me too, unfortunately! But I will never add an interaction to a model again before I center at least the involved continuous or ordinal variables. It is remotely possible that non-centered variables do not correlate strongly with their interactions, but most of the time some major multicollinearity happens when one does not center the variables and models the variables together with their interaction(s).

As I understand it centering changes the definition of the intercept which is obviously important if you work with categorical variables (as I normally do).
It can also cause other disturbances (like the one I am struggling with) and this is one of the many reasons I am a 'multicollinearity hater' :p , but it is a very good way to make sure the added interactions are not sources of multicollinearity.

--------------------------------------

It sounds to me that it would be simpler to run a VIF or tolerance test and if they don't show MC not to worry about centering.
That sounds quite fine to me, but the problem is most of the time, when you add some interactions to the model, the VIF indicates that there is multicollinearity here. So you need to either remove some variables (from main variables or interactions) or center your main variables.

Here is a different question on interaction. Say you have the following
Y=B0 +B1X1 + B2X2 +B3X3 + B4X1X2...

And this was the correct model, that is variable 3 was not involved in an interaction. When you analyze the effects of B1 at specific levels of B2 (simple effects) do you still have to analyze it at specific levels of B3 -as with a three way interaction or not?

I don't think so, but I ran across a reference that made me think you might.
You mean would I prefer entering three- or four-way interactions into the model as well?

Well I would if 1) the LRT showed me the model has become more accurately predictive 2) It did not cause my currently significant variables go non-significant (once the number of variables and interactions get too high, the model itself becomes excellent but many predictors become non-significant, so I prefer a less accurate model with a couple of significant predictors) and 3) if I could easily understand or interpret them. The two-way interactions seem quite disturbing most of the time, let alone three- or four-way interactions. In my field even a multivariate analysis (without or with a few two-sided interactions) can be disturbing for clinicians and reviewers and editors (I had this experience last week), so I would avoid higher-level interactions even if they favored my model, for practical limitations.

====================

No I mean if you are doing a two way interaction simple effect (you don't believe there are three or more level interactions) do you still need to analyze this at a specific level of the other IV as you would if there is three way interaction. That makes no sense to me to do, but I ran into an article that said you did.

I don't worry about three or more level interactions. For one thing others have found them to be rare and even if they exist I doubt I could explain them. Its difficult enough to explain two way interaction. Three way interaction makes my head hurt :(

Aha I think you mean like this article. Actually I would never do it because it is way beyond me and my audience! (not because it is not a good thing) :)
 

noetsi

Fortran must die
#17
No I mean if you are doing a two way interaction simple effect (you don't believe there are three or more level interactions) do you still need to analyze this at a specific level of the other IV as you would if there is three way interaction. That makes no sense to me to do, but I ran into an article that said you did.

I don't worry about three or more level interactions. For one thing others have found them to be rare and even if they exist I doubt I could explain them. Its difficult enough to explain two way interaction. Three way interaction makes my head hurt :(
 
#18
So Number of sexual partners plays a suppressor/moderating role in the interaction between SW (sexual worker) and HIV? I just would like to understand the logic if this regression to be able to explain it to my supervisor and to a not statistical experience audience.

1. So when I regress HIV (DV), SW(IV) controlling for Number of sexual Partners (IV), the effect of SW means that I am compering Sex workers vs not sex workers with the same number of sexual partners right? For example, I am comparing a sexual workers vs a non-sexual workers with 20 sexual partners each so the effect of sexual partners is canceled/ controlled; is this statement correct?
2. Seeing this from another perspective, number of sexual partners among this community does not influence participants likelihood to have HIV when controlling for sex workers status. How can I interpret this? Perhaps, This means the numbers of sexual partners a participant has not predict HIV status if he/she is a sex worker? This will be the same for participants who are not sex workers?

Thank you in advance. I would like to make a case with this information and perhaps write a manuscript.

Best,
Marvin
Wow .. I am very thankful with all you guys. Tarek your response was outstanding, as well as Victor and Noetsi. I will follow your advice and will get back to you guys. My ultimate goal with this post is to be able to explain a non statistical experience audience this finding (assuming that it is valid) and perhaps write a paper for publication. What would be a nice takeaways or conclusions of this finding (again let suppose it is correct).

Thank you all!
 
#19
If the model is correct then multiocolinearity will not bias your coefficients, it will just make your estimates (for those coefficients) inefficient (i.e. the standard errors will be bigger than they should be).

So if he likes the model then I see no reason not to go ahead and interpret the coefficients.
 

noetsi

Fortran must die
#20
I am trying to figure out why victor deleted his own post:p

I agree with threestars although because the SE are inflated its possible that a variable might be statistically signficant but not show up that way in the test. Particularly with p values near signficance this should be pointed out. Journals are likely to object if you have high MC and don't address it. But the coefficients will be (as threestar notes) correct despite MC.

MC is one of the hardest problems to deal with (although I hear now that centering will sometimes do this). You might consult the Fox Sage monograph on Regression Diagnostics although the long and short of it is that he says there is no easy way to deal with it:p Well he also notes that MC has to be really estreme to matter.