Continous (Ordinal) Outcome with Middle Group as Target

hlsmith

Not a robit
#1
I had someone send me an idea for a project. The dependent variable is continuous, though they are interested in predicting the middle region. So imagine a value of 0-infinity and they want to trichotomize data into 0-10; 11-20; 20+, with 11-20 as the target. The context is that the middle values are ideal and low or high values have detrimental effects.

Any recommendations - I don't want to be confined to a logistic based models, if there are better options?

There is obviously mulitinomial and ordinal. I would start modeling data as continuous to examine linearity. But I wanted to see if I was missing anything or if anyone had run one of these before with target group in the middle. I get that I may also need to run a heterogeneity of odds test if using ordinal logistic reg.

@Miner or @GretaGarbo this seems like something you all may have experience with - maybe.
 
#2
....though they are interested in predicting the middle region.
So they prefer values in the range 11 - 20? Does this also mean that they maybe prefer a single value like 15? Then they could use a quadratic loss function like: (y-15)^2. That would mean that deviations from 15 would be penalized more.

Do they have explanatory variables that could help predict y (the dependent variable). If not, they could just predict with the median or the mean. (Or is there any auto correlation in the data.) Or do they want to investigate what influences y by doing experiments? I am sure @Miner has a lot to say about robust construction.
 

hlsmith

Not a robit
#3
@GretaGarbo

Yes, this seemed like a quality improvement type question that @Miner may be familiar with.

So the range is ideal, it is a chemical concentration in the blood, so that range is considered good, if lower than the range the chemical is not effective and if higher it may be toxic. So it is not a value (e.g., 15), but an expectable range.

There is no experiment, everyone gets the same protocol (dosing). However, the question is whether other variables are associated with a value being in the range. So is age associated with whether the protocol is likely to get a person into the range. Theoretically young people will likely be above the range, their bodies don't clear the chemical that fast, while older people will likely have lower values since their kidneys are developed and they will clear the chemical.

No autocorrelation, just examining this at one time point.
 

Miner

TS Contributor
#4
Sorry for the delayed response. I just returned from a 10-day cruise from Vancouver to Hawaii.

In an industrial situation, I would treat the response as a continuous response with a specification of 11-20. This maximizes the information content available in the data. For example, 0 is probably worse than 10, and 21 is probably better than 30, yet the ordinal categories treat them equally. This opens up a lot more analytical options. Once you have a model, it becomes an optimization problem.
 

hlsmith

Not a robit
#6
I have worked on so many projects in the past two weeks, I don't even remember the context, let me think here.

There is no experiment, everyone gets the same protocol (dosing). However, the question is whether other variables are associated with a value being in the range. So is age associated with whether the protocol is likely to get a person into the range. Theoretically young people will likely be above the range, their bodies don't clear the chemical that fast, while older people will likely have lower values since their kidneys are developed and they will clear the chemical.
So given my prior descriptions, all patients receive the same dosing protocol which at time point X, everyone is suppose to have a certain blood concentration for a chemical. I believe the person who contacted me wanted to model to see which variables were associated with the patient being in the ideal range of 11-20 (concentration level, which could be as low as 0 and not too much higher than say 30 or 40, but desired to be in the 11-20 range). So I guess a GAM could be used, but that doesn't feels quiet right or maybe a quantile regression of some sort, or looking at the highest density interval. Hmm.