Depends on the program you are using. SAS will automatically create the categories if entered into the class statement. What is your sample size and numbers in the two groups?
Helo
I am faced with a problem which looks to be a clear case of logistic regression.
A dependent variable(binary in nature) needs to be predicted given a set of independent variables. However almost all the independent variables are categorical in nature and for most of them the number of categories are large. Coding each of the categories of all variables would be a highly laborious task. I want to know what is the approach to a situation like this where the independent variable is categorical in nature and has a huge number of categories.
Requesting all statistical modelers to suggest an approach...really stuck on this one!
Best Regards
Abhijeet
Depends on the program you are using. SAS will automatically create the categories if entered into the class statement. What is your sample size and numbers in the two groups?
Stop cowardice, ban guns!
hi,
maybe you could code them with numbers and treat them as a quasi-continuous variable? With sich a high number of distinct categories this could work.
regards
rogojel
To some extent it depends on whether the data is ordinal or nominal in nature. If it is ordinal with 20 plus levels it might well be interval like so you could treat it that way. If it truly is nominal data, that you probably want to collapse multiple levels into one before you do any analysis. So you might code levels 1-6 as 1, 7-13 as 2 etc. To do this you need some theoretical or at least common sense reason to create these dimensions. This is often done with age for example.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Tweet |