Predictive Model : Data Type


New Member
Hi Stats Gurus,

I am working on a model that predicts gestation period of new hires. By gestation period, i mean the period between they completed their trainings and the time they start working on their first real live project. I have data for gestation period in months. I classified the time into two parts - "0 - 3 months" and "3 months or more". The purpose of the model is to reduce gestation time. It means the HR will recruit from only those colleges in which the students had performed well in the past and they went live early ( low gestation time).

I have run logistic regression using gestation time as dependent variable - 1 for "0-3 months" and 0 for "3 months or more".

The independent variables are their college names, education qualification, specialization subject, graduation scores, training scores etc. Under college name variable, i have data for 29 colleges. In other words, they are 29 options in this variable. How can i use this info as independent variable in developing predictive model? Should i take "1-29" options? Or any other way to group the data for this variable in logistic regression? Any other statistical technique you would suggest?

Thanks in advance!


Less is more. Stay pure. Stay poor.
Perhaps look into hierarchical models, controlling for random effects of universities.

Is 3 months magical, why not treat it as a continuous variable.