# "Piped" Variables in Logistic Regression

#### scoulter

##### New Member
I am attempting to predict college enrollment using variables that are "piped" responses, where answers from a previous question are piped into subsequent questions. For example, I have a variable on whether students took the SAT (yes/no) and a piped response on whether those taking the SAT scored above 1250 or not. I was initially trying to treat taking and passing the SAT as polychotomous, but either that's not possible, or I don't know how to do it. The problem results in the subsequent question, where I'm left with only information on those who took the test (and not on those who did not). The piped variable has a "0" for a score below 1250, a "1" for above 1250 and blanks for those who did not take the test. If I try to use both variables as predictors, the piped variable reduces the n size in the analysis to about 1/3 because very few students actually took the SAT. Is there a way to model this "piped" data using dummy variables so that I don't reduce the n size so dramatically?

#### Attachments

• 186.9 KB Views: 0

#### hlsmith

##### Not a robit
It almost seems like you likely have to different questions or have to subset the dataset. Say I am trying to predict hospital length of stay and I know breast cancer status is a greater risk for a longer stay. Well not everyone is eligible to have breast cancer (in the traditional since). If I run the analyses, most logistic models listwise delete men, since they have missing data. I think it is just a subset question. You could try to trick the model by using domain knowledge and putting the median score of comparable peoples in for missingness or impute them, You are probably best off just dropping the quantitative scoresand creating groups (e.g., not takers, Lower scores, and high scores) and not treating them ordinal as much as just nominal.

An issue you may have is that you could have omitted variable bias, meaning the reason they did not take the examine is also associated with your outcome. So excluding them, just means any generalization can only be made toward those who took the exam.