What test to use? Correlation/regression VS ANOVA

Hi all. I have a very shallow understanding of statisitics so would really appreciate if someone could help me with what test to use.

Say I have two groups of people (A and B). I want to compare the relationship between stress and smoking in these two groups of people.

Group (A or B) = categorical
Stress = continuous score
Smoking = a continuous score or can be categorised into light/moderate/heavy smokers

I hypothesize that in both groups, as stress score increases so does smoking score.

I also hypothesize that group A are more sensitive to this effect (due to neurobiology). So an increase of stress score in group A causes a greater increase in smoking than it does for group B.

In order to test this I have thought of two statistical approaches.
First, I could carry out a correlation/regression to see the link between stress and smoking in each group. I could then somehow compare the gradient values of each group to see which one has the steeper gradient (and therefore more sensitive to the effects of stress on smoking). However, not sure how to do this...

Second, I could perform some sort of ANOVA by using the smoking categories. I could see if there is a difference in stress scores between light, medium and heavy smokers. I could then look at the interaction of "group" by demonstrating a small group difference between stress scores in light smokers, but then a bigger difference in heavy smokers. Again not sure how to do this...

My question is which approach makes more sense (if either)? And any help you can give on how to do this on SPSS would be much apreciated.

Thanks very much in advance!
No ANOVA! As a rule, given continuous data, you should never arbitrarily divide it into high/medium/low catogories in order to do an ANOVA. Doing so throws away information in multiple ways. Unfortunately, it's often done by people who never learned to do anything other than ANOVAs.

Here is how I would do this analaysis: For each group, I would do a linear regression on stress vs. smoking. That gives me a slope, with an error bar, saying how many units of additional stress are produced from each additional unit of smoking. I would then compare the slopes to determine whether they are significantly different in each group.

Suppose, for example, that the slope is 1.2 +/- 0.3 for group A and 2.1 +/- 0.4 for group B. The difference in slopes is thus 0.9 +/ 0.5, corresponding to a 95% confidence interval of [-0.1,1.9], so not quite significant at the 95% level.


I agree with ichbin on the use of regression (why waste perfectly good numeric data?). I would approah the regression slightly differently. I'd enter in the smoking as the DV and then the stress as the first predictor/covariate, then I'd enter the groups as a final IV. To do the you'd dummy code the groups into another variable/column and code 0-1. 0 is for group a and 1 is group b. Now the regression coeffcients will capture the difference in slopes between the 2 groups. Group A is your constant and group b is the beta. The beta isn't a slop but a mean difference between groups. Ichin's technique works well with 2 groups but as the number of groups increases it becomes a pain to rerun the analyisis multiple times.
I agree with the others, don't make your continuous variable into categories to test for interactions. Not only do you lose information that way, but you can actually mask a significant interaction between the original variables or create one that did not exist just as an artifact of the dichotomization! This article goes into more detail on the topic:

Here is my advice. What you have hypothesized is an interaction effect between group (g) and smoking (s) on stress (y). So you have three effects to test: the main effect of group, the main effect of smoking, and the interaction between them. What you want to do to statistically test the interaction effect (is it different from 0?) is to create a new variable to represent this effect by literally multiplying the group variable by the smoking variable (g*s).

When you are running the regression first run a model without the interaction:
y = (b1)g + (b2)s
The F test will test if either of these coefficients is different from 0, and the t-test for each coefficient will be your main effects tests. Next, to test the interaction, add our new interaction variable to the model:
y = (b1)g + (b2)s + (b3)(g*s)

The t-test for the g*s coefficient will be the statistical test of your interaction hypothesis; if this is not significant, you don’t have an interaction. (Note: ignore the t-tests for the main effects in this new model; we already know from our first model if they are significant or not, and collinearity with the interaction can change their effects).

If you do have an interaction, rerun the regression but separately for each group to see the effect of smoking on stress in each group (is the slope steeper in group A or group B?).

For more information on regression and interaction testing check out Cohen, Cohen, Aiken, and West: