I have 10 binary dependent variables (disease prevalence: yes/no, 1/0). I want to do a logistical regression with each of them with the following binary categorical independent variables:
- Year (8 levels)
- Gender (2 levels)
- Age (11 levels)
- Season (2 levels)
- Location (10 levels)
After hours of Google research I am still confused by certain things. So here are my questions:
(Notes: I use SPSS 20; For the questions lets use 'Year' as the independent variable)
1) Analysis Choice:
- Just to be sure: binary logistic regression is the analysis of choice, right? (Because all the variables are categorical)
2) Dummy Coding:
- When performing LogReg, do I have to make dummy variables or not? I tried it with and without dummy codes and the results are completely different.
3) Covariate Types/Options:
- In SPSS - having desperately tried all options - I noticed there is a difference between marking and not marking variables/covariates as 'categorical'. What exactly is the difference and when should I use which option?
4) Contrast/Reference Category:
- In SPSS you can set the Contrast and Reference Category. I assume that Contrast should remain on 'indicator' (alternatives are: simple, difference, helmert, repeated, polynomial and deviation) since I've never read anything about changing that. Right?
- Reference category means which category you compare it with, right? But that is awfully confusing for me in this setting. Suppose I use 10 dummy variables to express year, what does it mean that I compare '2004=0' with '2004=1' and '2005=0' with '2005=1', etc.? And what I do not use binary dummy variables but have to use the single 'year' variable with its 10 levels?
- To compare all the subsequent years with the first/lowest one (ie: compare 2001-2009 with 2000) I should (obviously?) use 'reference category = first'?
5) Method/Backward Elimination:
- For each of the disease prevalences I have to use backward elimination to reduce the model by eliminating interactions with a p>.15.
- Though I know I have to do it, I do not know which of the three stepwise backward methods I should use (Conditional, LR or Wald).
6) Making Sense:
- A thorough explanation of "why (not)" and "what" of the above issues would be wildly appreciated! This forum is awesome; I fully intend to remain an active lurking or even post in order to increase my stat proficiency
.
Thanks in advance!
- Year (8 levels)
- Gender (2 levels)
- Age (11 levels)
- Season (2 levels)
- Location (10 levels)
After hours of Google research I am still confused by certain things. So here are my questions:
(Notes: I use SPSS 20; For the questions lets use 'Year' as the independent variable)
1) Analysis Choice:
- Just to be sure: binary logistic regression is the analysis of choice, right? (Because all the variables are categorical)
2) Dummy Coding:
- When performing LogReg, do I have to make dummy variables or not? I tried it with and without dummy codes and the results are completely different.
3) Covariate Types/Options:
- In SPSS - having desperately tried all options - I noticed there is a difference between marking and not marking variables/covariates as 'categorical'. What exactly is the difference and when should I use which option?
4) Contrast/Reference Category:
- In SPSS you can set the Contrast and Reference Category. I assume that Contrast should remain on 'indicator' (alternatives are: simple, difference, helmert, repeated, polynomial and deviation) since I've never read anything about changing that. Right?
- Reference category means which category you compare it with, right? But that is awfully confusing for me in this setting. Suppose I use 10 dummy variables to express year, what does it mean that I compare '2004=0' with '2004=1' and '2005=0' with '2005=1', etc.? And what I do not use binary dummy variables but have to use the single 'year' variable with its 10 levels?
- To compare all the subsequent years with the first/lowest one (ie: compare 2001-2009 with 2000) I should (obviously?) use 'reference category = first'?
5) Method/Backward Elimination:
- For each of the disease prevalences I have to use backward elimination to reduce the model by eliminating interactions with a p>.15.
- Though I know I have to do it, I do not know which of the three stepwise backward methods I should use (Conditional, LR or Wald).
6) Making Sense:
- A thorough explanation of "why (not)" and "what" of the above issues would be wildly appreciated! This forum is awesome; I fully intend to remain an active lurking or even post in order to increase my stat proficiency
Thanks in advance!