Are the dummy variables related? Like are they indicators for the same categorical variable
Dear Colleagues,
I am running a multivariate linear regression analysis but got stuck with this situation. I have one dependent variable (which is continuous) and several independent variables. Three of them are continuous and three of them are dummy variables (they have values of 1 and 0).
Could you please help me on how should I treat these dummy variables in my analysis? The only thing I am sure about is that I should include only two of my three dummies into the regression (according to k-1 rule). Is this enough? Do I need to do any further steps? Moreover is the omitted dummy variable a "reference variable"?
I use SPSS but also have some basic R knowledge, so I could try the steps in both!
Thank you very much for your help in advance!
Are the dummy variables related? Like are they indicators for the same categorical variable
I don't have emotions and sometimes that makes me very sad.
A small point and then some help: a multivariate regression has multiple dependent variables while a multivariable regression has multiple independent variables with one dependent variable.
As for the dummies, it looks like you set up the dummies correctly. If you fit the intercept, then the k-1 rule will work perfectly. Without an intercept, you would need all 3 dummies. You're spot on that the omitted dummy will be set as the reference level (0,0).
You could do this. I would recommend entering all non dummies into the first "block". Then in "block 2" add the k-1 dummies (that are specific to one qualitative variable). This will allow you to have a partial F-test for the dummies pertaining to the actual categorical variable (assuming you planned on testing that variable). It should also give you the model estimates for block 1, then block 1 and block 2 in the same model. Feel free to post some output screenshots if you'd like!
Last edited by ondansetron; 04-03-2017 at 01:08 PM.
How about leaving the categorical variable as categorical and doing a general linear model?
It gives you the group means directly, would tell you straight off if there was a significant difference in the group means, and post hoc tests would tell you which means are different. The groups are all treated equally, so there is no problem about which variable to leave out. Easy to understand, do and interpret.
You're right. You can do all those things with dummy variables, and the GLM actually does it that way.
The advantage of the GLM is that it does all those things for you. If you had to do a one way anova, you could use regression on dummy variables, but it's so much easier to just do the anova and look at the results. The GLM effectively simultaneously does a regression with the continuous variables and an anova with the categorical one and gives you the appropriate regression or anova results.
ondansetron (04-04-2017)
I don't use SPSS often, but I think I recall I use the "block" feature to get the nice subset tests since I prefer to use dummies (just how I was taught, although if you know what you're doing you can make sure to end up in the same place...otherwise I think you can leave in the original variable but tell SPSS it's a QL and it'll do some coding automatically and give you the subset). In SAS, which I'm more familiar with, I think I do recall what you're saying. The proc GLM will give you Type III sum of squares output for any of the class variables. I think in proc reg I just use the test statement and specify the subset tests I want or any of the extra output.
I always get a kick out of the confused look people get when you tell them they can use a regression model to do an ANOVA (and it gets better when they see the connection)! Then the fun is explaining how that model underlies the ANOVA and showing that they get some different utility from running one versus the other (like accounting for other independent variables).
Thanks for clarifying, though. I thought I was missing something!
In Minitab and DataDesk you just put in the model and say which variables are factors and which are covariates, and away it goes.
Tweet |