# A question about whether dropping interaction terms as a whole or not

#### freezingswallow

##### New Member
Hello!
I'm taking a regression course this semester and I have a question regarding when to drop the interaction terms.
Suppose we have a regression model like this:
Y = a + b1*X1 + b2*X2 + b3*X3 + b4*X1_X3 + b5*X2_X3
where Y is the response variable (say, the outcome of a disease),
X1 and X2 are both indicator variables for the same factor (for example, there are three BMI categories, and I choose the third category to be the baseline group)
X3 is another variable ( eg. age)
X1_X3 is the interaction between X1 and X3
X2_X3 is the interaction between X2 and X3
After running the regression model on a data set, if we found the beta coefficient for X1_X2 is non-significant (p-value > 0.5) but the beta coefficient for X1_X3 is significant ( P-value < 0.5), should we drop X1_X3 only or both X1_X3 and X2_X3?
My question here is when the two interaction terms are actually between the same two factors ( just different indicator variables for different levels of the same variable), should we treat these interaction terms as a whole or treat them separately?
As we know, for main effect, we cannot drop X1 only and leave X2 in the model. I'm not sure if this is still the case regarding interaction terms!
Thank you!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I did not completely follow your description, you could probably benefit from some returns/spacing.

Did you enter BMI into the model as a single categorical variable, then the program kicks out more than one beta coefficient?

I believe you keep them all in when they cover the same variable (e.g., BMI).

#### freezingswallow

##### New Member
Sorry for causing misunderstanding.
I mean I'm trying to build a model to predict coronary heart disease, for example.
And my current model is:
Y = a + b1*BMI_Group1 + b2*BMI_Group2 + b3*age + b4*BMI1_Age + b5*BMI2_Age
where a, b1~b5 are just coefficients,
BMI1_Age is the interaction between BMI_Group1 and age
BMI2_Age is the interaction between BMI_group2 and age
And I run the model on my data set and found the p-value for one of the interaction terms is greater than 0.05 and p-value for the other interaction term is less than 0.05.
I'm not sure whether I should keep both interaction terms in or just the significant one.
Will this be more clear?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I totally got that part, at least that is what I assumed you meant. So, my response would still be the same as the above #2 post.

I believe you keep them all in when they cover the same variable (e.g., BMI).