Most influential variable

#1
Hi,

I have a data series with 5 variables, the last being a yes or no.
I want to know which of the 4 other variables has the most influence on the last being a yes or no.
The 4 variables are all non-numeric with about 5 options.
How can I do this?

Thanks for the help
 
#2
by non-numeric, I am assuming that you mean that they are categorical, such as for example Race (White, Black, Asian, etc), where order of the answers plays no importance. In this case the best way to work with this type of data is to recode them into binary variables. So you will have one variable called White (yes/no) and another Black (yes/no), etc. Then you can run a logistic regression with your outcome variable being the last one. You should probably recode it into 0/1 numerical variable for ease of interpretation. Then for your independent variables you will use all but one of each of the new binary variables. For example, if your race variable is White, Black or Asian. Then you can place Black and Asian into the model and leave White out to use as a reference for the other two. Then you will need to do the same with the other 3 independent variables. After you run the model, you can look at the odds ratios and p-values to see which of these predict your outcome. If your non-numeric variables have an order to them, for example, "I don't like it" , "I don't care either way", "I like it", we call this an ordinal variable, because these answers have an order in which they fall. You can do similar type of recoding into binary variables as well, leaving off either one on the end, for ease of interpretation.

Jenny Kotlerman
www.statisticalconsultingnetwork.com