Hey guys,

So I've been battling with Logisitc Regression in SPSS for about 1 week now, and I'm getting a bit fed up .

To set the scene: I have a dependent variable "Have you ever drunk alcohol?" and a pool of about 300 kids, with responses to questions regarding "confidence", "like of school" and "participation in out of school activities". All of these variables are categorical.

My aim is to look at each variable seperately as an exposure variable, and control for the other variables (after determining which variables are strongly associated with my dependent variable). For each exposure-dependent variable model I will obtain odds ratios around the correct estimates that will allow me to say things like "children who strongly disagree that '

The difficulty comes when determining what variables to throw into the initial model, and then what to do with the initial model after I have thrown such variables in.

Thus far I have determined which variables are associated with the dependent variable via bivariate analysis and Cramer's V Test. This gave me 9 variables which all have an association (ranging from 0.1-0.35) with "Have you ever drunk alcohol". The next step is to determine whether an of these variables are associated with one another, because entering two strongly associated variables into the model can lead to instability etc... I constructed a 9x9 matrix, which indicated that none of these 9 variables are closely associated (all values being less than 0.4).

Thus all of these 9 variables should be controlled for when modelling "Have you ever drunk alcohol".

1. Running a logisitic regression model for each pair of variables.

2. Running a logisitic regression model for each pair of variables WITH the interaction term of these two variables.

3. Determining the Log Likelihood ratio of these two models.

4. Using this statistical result, imposing that "this particular interaction term should not be put in our initial model".

Trouble is now I have about 20 initial variables... and surely the hunt for other confounders and interaction terms doesnt stop here. This isn't my main issue though, my main problem is knowing what to do when I have got my initial model.

Suppose I wanted to consider the exposure variable "Your teachers treat you fairly" - Strongly Agree, Agree, Disagree, Strongly Disagree (possibly collapsing it to an Agree and Disagree variable). To determine odds ratios with confidence intervals around the correct estimate we want to run a logistic regression model with "Have you ever drunk alcohol?" as the dependent variable, "Your teachers treat you fairly" as the exposure variable, and controlling for all other confounders: "Do your parents drink", "Gender", "Your school teachers treat you fairly", "School rules are too strict", etc...

I have the 9 variables that are associated with this the dependent variable, one of which is the exposure variable of interested. But what do I do in SPSS to run the best model possible, to end with the most appropriate/correct odds ratios and confidence intervals for my exposure variable "Your teachers treat you fairly" ??? ???

I understand the theory behind the backwards and forwards LR and Wald Stepwise method of building a model (I have been reading

The professor of the department said:

But what about improving the odds ratios, and making sure you are properly controlling for the other variables (i.e. what about ensuring that the odds ratios and confidence intervals one has for the exposure variable of interest are correct?).

Thanks very much.

L-dawg

So I've been battling with Logisitc Regression in SPSS for about 1 week now, and I'm getting a bit fed up .

To set the scene: I have a dependent variable "Have you ever drunk alcohol?" and a pool of about 300 kids, with responses to questions regarding "confidence", "like of school" and "participation in out of school activities". All of these variables are categorical.

My aim is to look at each variable seperately as an exposure variable, and control for the other variables (after determining which variables are strongly associated with my dependent variable). For each exposure-dependent variable model I will obtain odds ratios around the correct estimates that will allow me to say things like "children who strongly disagree that '

*school is a nice place to be'*are twice as likely to have drunk alcohol than those who strongly agreed".The difficulty comes when determining what variables to throw into the initial model, and then what to do with the initial model after I have thrown such variables in.

Thus far I have determined which variables are associated with the dependent variable via bivariate analysis and Cramer's V Test. This gave me 9 variables which all have an association (ranging from 0.1-0.35) with "Have you ever drunk alcohol". The next step is to determine whether an of these variables are associated with one another, because entering two strongly associated variables into the model can lead to instability etc... I constructed a 9x9 matrix, which indicated that none of these 9 variables are closely associated (all values being less than 0.4).

Thus all of these 9 variables should be controlled for when modelling "Have you ever drunk alcohol".

**[This is correct, is it not?]**On top of this Gender should be controlled for.**Interaction Terms**What interaction and other confounders, if any, should be put into the model? I spent a good few hours taking all of these 9 variables and:1. Running a logisitic regression model for each pair of variables.

2. Running a logisitic regression model for each pair of variables WITH the interaction term of these two variables.

3. Determining the Log Likelihood ratio of these two models.

4. Using this statistical result, imposing that "this particular interaction term should not be put in our initial model".

Trouble is now I have about 20 initial variables... and surely the hunt for other confounders and interaction terms doesnt stop here. This isn't my main issue though, my main problem is knowing what to do when I have got my initial model.

Suppose I wanted to consider the exposure variable "Your teachers treat you fairly" - Strongly Agree, Agree, Disagree, Strongly Disagree (possibly collapsing it to an Agree and Disagree variable). To determine odds ratios with confidence intervals around the correct estimate we want to run a logistic regression model with "Have you ever drunk alcohol?" as the dependent variable, "Your teachers treat you fairly" as the exposure variable, and controlling for all other confounders: "Do your parents drink", "Gender", "Your school teachers treat you fairly", "School rules are too strict", etc...

I have the 9 variables that are associated with this the dependent variable, one of which is the exposure variable of interested. But what do I do in SPSS to run the best model possible, to end with the most appropriate/correct odds ratios and confidence intervals for my exposure variable "Your teachers treat you fairly" ??? ???

I understand the theory behind the backwards and forwards LR and Wald Stepwise method of building a model (I have been reading

**Kleinbaums "Logistic Regression"**- excellent :tup, I just don't know which one to use (and why) in order to achieve what I want to achieve (namely the above paragraph).The professor of the department said:

*"Run a regression model on the main affect variables (i.e. the 9 variables I have been discussing), and add interaction terms only if they increase the percentaged explained (in the SPSS output)".*But what about improving the odds ratios, and making sure you are properly controlling for the other variables (i.e. what about ensuring that the odds ratios and confidence intervals one has for the exposure variable of interest are correct?).

Thanks very much.

L-dawg

Last edited: