# Missing data

#### noetsi

##### Fortran must die
My great dread. I have a logistic regression There are 645 cases and data is missing in about 228. Only 6 did not answer the dependent variable the median missing for any predictor is 15 about 2 percent of the responses. The problem is there are 38 predictors and being missing on any causes you to get thrown out of the logistic regression (long ago I read about an alternative when you used all the cases even when they were missing some information on some question, but the problems raised in doing so convinced me this was too dangerous).

I am not sure what to do, I know of multiple imputation, but my understanding is that doing this with non-interval data is problematic (actually I stopped studying this because I was told that on this board years ago). All my predictors are dummy variables, my DV has two levels.

We are doing this to determine which variables are relatively more important, the way we do that is see which are statistically significant (I have found no good way to address relative importance with logistic regression). I am not sure what to do with so many missing cases.

Is it reasonable when you see an unusually high number of cases missing to remove a question, because you think people did not understand it, or had no answer (in honesty I think this is true with the specific question even ignoring all the missing questions - no one asked me about it when it was created)?

#### noetsi

##### Fortran must die
I don't really understand what that is doing. I will read the link and try to do it.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
It is showing the pattern of missingness. Everywhere there is an X, that variable is available. So the first row is the scenario where all variables are present, with 97% of people having all variable data.

#### noetsi

##### Fortran must die
I generated that, but the table is too big to show here I think. It has 122 groups and 31 variables.

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I was hoping it would help inform us on the amount and pattern of missingness.

#### noetsi

##### Fortran must die
I posted it. Just looking at the raw data I don't think there is an obvious pattern. One issue I found out is that what is missing includes people who did answer the question, but answered don't know/NA. They are missing as far as SAS is concerned, but they did answer the question.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Is the DV listed in that table as well?

#### noetsi

##### Fortran must die
I didn't think anyone else would be interested, that never crossed my mind.

The DV was not generated by the code you sent me hlsmith. Was it supposed to? I have the DV, but only six people total are missing on that. Virtually every one responded to that question. The reason I think is that the missing data is not tied to people choosing not to answer, but to people saying they don't know.