- Thread starter zeloc
- Start date

I see you're a pretty new member. Welcome aboard Well you haven't given much to go on. Since you're new I suggest having a look at Posting Guidlines (especially 5 and 6). This will help you get better responses.

You've limited us to regression but if this isn't necessary perhaps a simple spearman rank correlation would be useful (or kendall's). But we don't know enough about your data yet to move forward.

I have data on 100 subjects that I was planning on ranking 1-100 (graduates of an educational program). There are a variety of predictors including ordinal and continuous. I am considering building a predictive regression model to predict desirable candidates based on information that was available before admittance. It is necessary to rank all prospective candidates so I cannot change the outcome although if something else was produced I suppose I could convert the information to a ranking, but there is no real quantitative outcome that I could invent because there are a lot of intangibles. I didn't consider possibilities other than regression although I would be interested in knowing what they are, but at least conceptually I am still curious what is the best way to work on this from within regression. Linear seems like an option but since the data is obviously nonnormal I would have to invoke the central limit theorem. Thanks for your feedback, hope this clarifies.

I have data on 100 subjects that I was planning on ranking 1-100 (graduates of an educational program).

It is necessary to rank all prospective candidates so I cannot change the outcome although if something else was produced I suppose I could convert the information to a ranking, but there is no real quantitative outcome that I could invent because there are a lot of intangibles.

I have worked extensively (as in 10-hr workdays for several yrs) with the majority of individuals and have observed all aspects of their performance. There are some who are truly phenomenal and some that are okay so it will not be difficult to come up with a fairly good ranking although toward the middle and bottom there might be some more ties. I suppose another possibility would be to just identify the really stellar candidates (definitely fewer than 10 out of the 100), and if a model could do this then someone could manually go through however many are selected in the prediction and manually rank them.

Since it will be used as a predictive model I am not really interested in knowing the relationship between the various predictors and the outcome, rather the overall model is more important. If using a regression this will also be much more convenient because I don't have to worry about confounding, etc.

Logistic regression is an option, but no easier than doing a linear, and since the outcome is going to be a ranking, linear would be better. So if within the bounds of regression, I am still wondering if there is a procedure for a ranking to be the outcome or if linear regression invoking the central limit theorem is the best.

May I ask why you want linear regression? Is it because of unfamiliarity with other techniques or lack of access to programs that could do the analysis? If this is the case we could point you to free resources for both direction and programs to run the test.

But perhaps I misunderstood how the DV is going to be coded.

If you have a hundred distinct ranks (that is a hundred different levels the dependent variable can take on) then logistic regression is not a good idea.

if i may, i would look at this problem in a different way. so you are ranking people... to rank people you need to have assessed their performance in some way. what are these performance assessments? are they tests like you'd do an exam and get a grade? reaction times, maybe (whoever finishes first gets the highest grade or something)? a more historical performance? i would work on the results

ps- linear regression on ranks is not a great idea. for instance, what's gonna happen when people get predicted rankings that are either below 1 or above 100? it could well happen in linear regression. or if you get someone who's predicted ranking is 50.001 and another one is 50.0015... who gets 50 and who gets 51? i think a lot of things stop being meaningful and the ranking would rely more on your judgement than on your analysis... in which case why run the analysis on the first place, right?

May I ask why you want linear regression? Is it because of unfamiliarity with other techniques or lack of access to programs that could do the analysis? If this is the case we could point you to free resources for both direction and programs to run the test.

I'm not familiar with generalized linear models, if this would be appropriate I can look into it more.

if i may, i would look at this problem in a different way. so you are ranking people... to rank people you need to have assessed their performance in some way. what are these performance assessments? are they tests like you'd do an exam and get a grade? reaction times, maybe (whoever finishes first gets the highest grade or something)? a more historical performance? i would work on the results before the rankings and, once i get a prediction on the performance, i'd rank accordingly.

ps- linear regression on ranks is not a great idea. for instance, what's gonna happen when people get predicted rankings that are either below 1 or above 100? it could well happen in linear regression. or if you get someone who's predicted ranking is 50.001 and another one is 50.0015... who gets 50 and who gets 51? i think a lot of things stop being meaningful and the ranking

ps- linear regression on ranks is not a great idea. for instance, what's gonna happen when people get predicted rankings that are either below 1 or above 100? it could well happen in linear regression. or if you get someone who's predicted ranking is 50.001 and another one is 50.0015... who gets 50 and who gets 51? i think a lot of things stop being meaningful and the ranking

To the second point it doesn't matter if one is 50.001 and 50.0015, this can all be converted into an ranking. I don't think the results are going to be more relied on by judgment instead of the regression. If the 2 candidates are that similar it's not going to be any easier for a person to make a decision, and especially considering their are thousands of applicants it would be much easier for a regression. Of course I could look at the r-squared or another measure to see whether the regression can figure the problem out.

I still don't see why OLS would be ruled out, are you saying that the distribution is just too nonnormal for the CLT?

Is CART an option?

The other question is, if there is no optimal approach, what would be the best? Thanks for everyone's feedback.

How would I do this? It would seem that I would need to choose a mean and SD, the mean is arbitrary but how would I choose a SD?

http://www.psych.cornell.edu/Darlington/transfrm.htm#median

Does anyone know how I would apply one of these procedures in SAS?

In order to get more power, I think the simplest would be to just convert the rankings into a normal distribution and run the regression on this. The link I posted above sounds like it describes exactly what I want to do but I'm not sure how to do this in SAS.