Interaction test

#1
I have a binary outcome and a binary independent variable, and am using an interaction test in SAS to determine whether sex (male vs female) modifies the association between the independent variable and the outcome. SAS shows that the interaction test p value is significant, but I don't understand how this p value is calculated i.e. is SAS using an ANOVA or chi square goodness of fit test or something else to perform this interaction test p value?

Second, what does it mean when the interaction test is significant, but the individual strata analyses are both not significant (and vice versa)?

Thanks in advance!
 

hlsmith

Omega Contributor
#2
Please post your code and I can then help. You are using PROC LOGISTIC, correct? Also it wouldn't hurt to post some of the output as well, especially the beta coefficients.
 
#3
proc genmod data=data1;
class exposedstatus (param=ref ref="0") pair;
model outcome=exposedstatus sex exposedstatus*sex/dist=poisson link=log;
estimate "exposedstatus" exposedstatus 1 -1/exp;
repeated subject=pair/type=unstr;
run;

In the output table 'Contrast Estimate Results', the right most columns say 'chi-square' and ' Pr>chisq'

Sorry I can't post my output, but I hope you may be able to answer my first question, and potentially let me know how you would go about evaluating the output to answer my second question.
 
#5
This is a cross-sectional study, which has an unmatched and matched analysis (hence the pair variable), and I'm using modified Poisson regression to estimate prevalence ratios, as I didn't want ORs from logistic as they overstate the PRs. Modified poisson is also robust against convergence issues.
Outcome is a binary variable for cancer screening participation (screen vs no screen). Literature in this area shows modified poisson is appropriate to use.
 
#7
That may not have been clear: in my study, I am doing a matched analysis to control for possible confounding, and separately, an unmatched analysis, to maximize external validity.
 

hlsmith

Omega Contributor
#10
Well, I am glad I asked a bunch of questions and didn't assume you were just running logistic regression. I have not run it with dist=Poisson and matching. Thus I would not be confident interpreting it with any exactitude.


Though the traditional interpretation for a multiplicative scale interaction term would be:


RR11 / (RR10 x RR01) being statistically different than 1, so < 1 antagonistic relationship, > 1 synergistic relationship.


With RR11 being a subject with both exposures,...,RR01 just a single exposure.
I wonder if you can get away with doing this in PROC LOGISTIC, naming dist and link, and using I believe STRATA for the matching variable. If so, you may be able to also use the EFFECTSPLOTS option which can output graphics which depicts the disordinal (crossing) lines associated with interaction.


Interpretation of a significant interaction is typically done by calculating two RR, and stratifying the two by using the exposure that is mutable. So if I had an interaction term for cancer and the two exposures were gender and smoking status, I would present the RR for women and also men. You do this because you cannot make a woman a man and vice versus if you intervened on the sample, but you could stop a person from smoking - that would be a viable intervention or policy.



P.S., I would still be interested in seeing a reference to the literature you mentioned related to the use of this model.
 
#11
Thanks for the info.

I keep seeing online information about using chi square goodness of fit tests to calculate the p value in interaction tests. Is that what you expect SAS is doing to calculate my p values, as per my code above?

Here is a link to one such article that uses modified poisson regression (non-matched analysis). http://bmcwomenshealth.biomedcentral.com/articles/10.1186/1472-6874-11-20 There is good evidence that modified poisson is appropriate in these types of situations https://www.ncbi.nlm.nih.gov/pubmed/15033648 (>2000 citations for this methods article I think). The only thing that seems weird is that it is hard to actually justify that the poisson distribution is satisfied. But given that modified poisson, when used in the context of binary outcomes, is misspecified already, I think that is why you cannot actually verify Poisson assumptions--based on correspondence I had with the author of that second article.
 

hlsmith

Omega Contributor
#12
Yeah, I have seen the use of Poisson to get at the RR. Though many may argue that if you don't have prospective data you should not use RR as a measure due to limited ability to ensure the temporal relationship between exposure and outcome and in your case two variables prior to outcome. Risk is the same as incidence and the latter is a measure of a cohort naïve of the outcome and then incidence rates are determined after time elapses.


You allude to this, but if the outcome is rare ~ 10%, the odds is a sufficient enough proxy to the risk measure.


I really cant make a determination of your output without seeing it.
 

hlsmith

Omega Contributor
#13
I just downloaded the article and per their validation using 2x2 tables, you could also probably do that by using the Cochrane Mantel Hantzel (sp?) (CMH) test.




Correction, I meant Breslow-Day test, but it may be limited to odds.
 
#14
Yeah, I have seen the use of Poisson to get at the RR. Though many may argue that if you don't have prospective data you should not use RR as a measure due to limited ability to ensure the temporal relationship between exposure and outcome and in your case two variables prior to outcome. Risk is the same as incidence and the latter is a measure of a cohort naïve of the outcome and then incidence rates are determined after time elapses.


You allude to this, but if the outcome is rare ~ 10%, the odds is a sufficient enough proxy to the risk measure.


I really cant make a determination of your output without seeing it.
I have also heard that it's poor practice to use relative risk in anything but a prospective study (i.e. it should be avoided in retrospective studies). The reason being that in a retrospective study, you don't know the true baseline for number at risk (denominator) at the start of the period, where as with a prospective study, you have the baseline number at risk and move forward from that point and record the events. Odds and odds ratios are viewed as more appropriate for any retrospective study for this reason (from what I've heard and seen).
 

hlsmith

Omega Contributor
#15
I just got done skimming the Zou article. So just by adding the REPEAT the Huber-White SEs are elicited. But it read that this happens when you only have 1 observation per repeat cluster. I will point out that you put subject=pair, which would negate this correct? Would be interesting to see a graph for how erroneously narrow the CIs are using this approach. The authors attempted to get at the CIs accuracy by using a simple example that they could also be ran using the MH procedure. Hmmm, very applied and not theoretical.


What is your sample size and proportions for the binary outcome?


Also, you have an estimate statement, I believe most sstrongly fell you shouldn't attempt to interpret base terms when you have a significant interaction term, this being that the base terms are actually conditional on the other variable in the interaction. I am not that familiar with PROC GENMOD. I know it has a bunch of good features. One thing that you can do is test the significance of the interaction term by running a model with and without it. Then run the chi-sq likelihood test yourself comparing the difference in -2LL values I believe. Its interpretation is just a multiplicative effect conditional on the moderator.
 
#17
Thanks for that link.

If I have 15% missing data for a categorical covariate in a multivariable modified Poisson model (using similar proc genmod code as before), and I make a missing category level within the variable, is there anything I should be aware of, given that I'm using GEEs with the repeated subject statement, and would there be different implications of doing this if the data is missing completely at random versus missing at random?
 

hlsmith

Omega Contributor
#19
As for missing data, do you know the missingness mechanism. if data are MCAR, do nothing, MAR, imputation is the best option, NMCAR, there is nothing you can do.


15% is fairly substantial. How big is your effect size, is it large enough to imagine that it may overshadow 15% systematic loss of data?
 
#20
In that first link, the outputs are very similar in terms of table formatting, and suggests that Wald CI's and Wald chi-square values are calculated by default. But in the second link you posted, I think it suggests that you actually need to specify Wald option to produce Wald statistics (unless Wald statistics constitute some additional stats apart from the CI's and chi square value).

For my missing data, the relative measure is about 1.3 (p<0.05) for the non-missing levels of the variable (for the adjusted and unadjusted models), and I think the missing category relative measure changes direction and crosses the null in when comparing its unadjusted vs adjusted value, but I'm not trying to interpret that missing category heavily with a high degree of confidence.

When you say imputation for MAR, do you mean multiple imputation? b/c I think mode imputation would introduce a lot of bias b/c we have no basis on which we can determine what level of the variable missing values belong to.