Regression with Continuous and Dummy Independent Variables

#1
Dear Colleagues,

I am running a multivariate linear regression analysis but got stuck with this situation. I have one dependent variable (which is continuous) and several independent variables. Three of them are continuous and three of them are dummy variables (they have values of 1 and 0).

Could you please help me on how should I treat these dummy variables in my analysis? The only thing I am sure about is that I should include only two of my three dummies into the regression (according to k-1 rule). Is this enough? Do I need to do any further steps? Moreover is the omitted dummy variable a "reference variable"?

I use SPSS but also have some basic R knowledge, so I could try the steps in both!

Thank you very much for your help in advance!
 
#3
Are the dummy variables related? Like are they indicators for the same categorical variable
Dear Dason, yes they are. I had one categorical variable that I transformed to several dummy variables - i.e. there was one variable with three values (1, 2 and 3) that I transformed into three separate dummies (with values 0 and 1).
 

ondansetron

TS Contributor
#4
Dear Colleagues,

I am running a multivariate linear regression analysis but got stuck with this situation. I have one dependent variable (which is continuous) and several independent variables. Three of them are continuous and three of them are dummy variables (they have values of 1 and 0).

Could you please help me on how should I treat these dummy variables in my analysis? The only thing I am sure about is that I should include only two of my three dummies into the regression (according to k-1 rule). Is this enough? Do I need to do any further steps? Moreover is the omitted dummy variable a "reference variable"?

I use SPSS but also have some basic R knowledge, so I could try the steps in both!

Thank you very much for your help in advance!
A small point and then some help: a multivariate regression has multiple dependent variables while a multivariable regression has multiple independent variables with one dependent variable.

As for the dummies, it looks like you set up the dummies correctly. If you fit the intercept, then the k-1 rule will work perfectly. Without an intercept, you would need all 3 dummies. You're spot on that the omitted dummy will be set as the reference level (0,0).
 
#5
A small point and then some help: a multivariate regression has multiple dependent variables while a multivariable regression has multiple independent variables with one dependent variable.

As for the dummies, it looks like you set up the dummies correctly. If you fit the intercept, then the k-1 rule will work perfectly. Without an intercept, you would need all 3 dummies. You're spot on that the omitted dummy will be set as the reference level (0,0).
I see. Thank you, ondansetron, for a clarification. So if I understand correctly, I can add all my independent variables to "Independent" box in Linear Regression window of SPSS (except for one dummy variable that is a reference category)?
 

ondansetron

TS Contributor
#6
I see. Thank you, ondansetron, for a clarification. So if I understand correctly, I can add all my independent variables to "Independent" box in Linear Regression window of SPSS (except for one dummy variable that is a reference category)?
You could do this. I would recommend entering all non dummies into the first "block". Then in "block 2" add the k-1 dummies (that are specific to one qualitative variable). This will allow you to have a partial F-test for the dummies pertaining to the actual categorical variable (assuming you planned on testing that variable). It should also give you the model estimates for block 1, then block 1 and block 2 in the same model. Feel free to post some output screenshots if you'd like!
 
Last edited:
#7
You could do this. I would recommend entering all non dummies into the first "block". Then in "block 2" add the k-1 dummies (that are specific to one qualitative variable). This will allow you to have a partial F-test for the dummies pertaining to the actual categorical variable (assuming you planned on testing that variable). It should also give you the model estimates for block 1, then block 1 and block 2 in the same model. Feel free to post some output screenshots if you'd like!
This is good advice.

To add in a little more. The reference variable should "make sense" rather than being chosen at random. When interpreting your parameter estimates, they will all be in the context of that variable.
 
#10
It gives you the group means directly, would tell you straight off if there was a significant difference in the group means, and post hoc tests would tell you which means are different. The groups are all treated equally, so there is no problem about which variable to leave out. Easy to understand, do and interpret.
 
#11
It gives you the group means directly, would tell you straight off if there was a significant difference in the group means, and post hoc tests would tell you which means are different. The groups are all treated equally, so there is no problem about which variable to leave out. Easy to understand, do and interpret.
Maybe I'm missing something, but I'm pretty sure you can do all of these things with dummies in OLS. I could be mistaken, though, and may have been using proc GLM those times. :D
 
#12
You're right. You can do all those things with dummy variables, and the GLM actually does it that way.
The advantage of the GLM is that it does all those things for you. If you had to do a one way anova, you could use regression on dummy variables, but it's so much easier to just do the anova and look at the results. The GLM effectively simultaneously does a regression with the continuous variables and an anova with the categorical one and gives you the appropriate regression or anova results.
 
#13
You're right. You can do all those things with dummy variables, and the GLM actually does it that way.
The advantage of the GLM is that it does all those things for you. If you had to do a one way anova, you could use regression on dummy variables, but it's so much easier to just do the anova and look at the results. The GLM effectively simultaneously does a regression with the continuous variables and an anova with the categorical one and gives you the appropriate regression or anova results.
I don't use SPSS often, but I think I recall I use the "block" feature to get the nice subset tests since I prefer to use dummies (just how I was taught, although if you know what you're doing you can make sure to end up in the same place...otherwise I think you can leave in the original variable but tell SPSS it's a QL and it'll do some coding automatically and give you the subset). In SAS, which I'm more familiar with, I think I do recall what you're saying. The proc GLM will give you Type III sum of squares output for any of the class variables. I think in proc reg I just use the test statement and specify the subset tests I want or any of the extra output.

I always get a kick out of the confused look people get when you tell them they can use a regression model to do an ANOVA (and it gets better when they see the connection)! Then the fun is explaining how that model underlies the ANOVA and showing that they get some different utility from running one versus the other (like accounting for other independent variables).

Thanks for clarifying, though. I thought I was missing something!
 
#14
In Minitab and DataDesk you just put in the model and say which variables are factors and which are covariates, and away it goes.