# Thread: Linear regression with all categorical variables SPSS

1. ## Linear regression with all categorical variables SPSS

Hello, I am running a multivariate regression analysis on the presence of an organism in an infection (dependent variable), and certain risk factors of the patients (independent variables).

I have organized all of the data into a list of dichotomous variables, 1 = the outcome or variable is present and 0 = the outcome or variable is not present for that patient.

The data is in SPSS as nominal variables. Is it appropriate to simply run linear regression with the outcome of 1s and 0s as the dependent variable and the risk factors (again as 1s and 0s) selected as independent variables?

Any help would be appreciated.

2. ## Re: Linear regression with all categorical variables SPSS

For dichotomous dependent variables, binary logistic regression instead of linear regression is appropriate. You can use binary variables as categorical predictors in this analysis. Small sample size and/or many predictors and/or highly correlated predictors can produce problems. What is your research objective, the research question?

With kind regards

K.

3. ## Re: Linear regression with all categorical variables SPSS

The overall objective is to identify risk factors for specific organisms in certain infections. Hopefully to calculate odds ratios etc.

I have a decent N (around 150) and several predictors but I may use backwards deletion to delete the highly insignificant ones. Well that is what other papers similar to mine have done. I am not entirely sure yet.

4. ## Re: Linear regression with all categorical variables SPSS

The overall objective is to identify risk factors for specific organisms in certain infections.
So you do not need to perform a multiple regression, bivariate analyses would do, AFAICS.
Or what is your actual reason for constuciing a multiple regression model?

I have a decent N (around 150) and several predictors but I may use backwards deletion to delete the highly insignificant ones.
For this kind of research question, and with many dichotomous predictors and a dichotomous outcome, usually n=150 is far from decent, unfortunately. If you have n=75 in each outcome group, as a rule of thump about 6-7 predictors could be included. If outcomes are far from balanced, say 50 versus 100, number of predictors would be even less. And I suppose that the impact of the candidate risk factors is not large, otherwise they'd already be known?

Moreover, stepwise deletion will probably produce overfitting, i.e. biased outcomes and a model which is not generalizable. https://en.wikipedia.org/wiki/Stepwi...sion#Criticism

Well that is what other papers similar to mine have done.
Were results from these papers ever replicated? If you do what the others did, you seem to be somewhat on the safe side. On the other hand, it is not convinvcing.

With kind regards

K.

5. ## Re: Linear regression with all categorical variables SPSS

There are only 4 predictors that I am interested in looking at, each is already known to some degree to relate to the outcome. The reason I wanted multiple regression was to find out what the strongest predictor was in relation to the others. I was told to include data on as many predictors as the N would allow but I agree I didn't think it would allow more than 4 or so.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts