# Nominal predictor with 70 levels....

#### noetsi

##### Fortran must die
I have a nominal level predictor with 70 levels (its units that provide some service). I want to analyze how these units preform on an interval response variable (I will use linear regression) and a two level response variable (I will use logistic regression). I can of course compare one level against the other 69 70 times, but this seems less than ideal because of family wise error and because I don't know what I would really learn this way. I really want to compare every level against each other level.

I was wondering if anyone had dealt with this type of issue before. I am trying to see how good units did relative to each other controlling for other variables. I could of course just do descriptive statistics, but I prefer not to because you really can't control for other variables with descriptives.

#### rogojel

##### TS Contributor
hi,
any chance of clustering the units first? If you could group units in dome way and use the centroids (?) that would solve the problem imo .

#### GretaGarbo

##### Human
Do a QQ-plot of all the estimated parameters. Those who deviates from a straight line will be "real" effects, in contrast to the randomness.

#### noetsi

##### Fortran must die
I have to comment on the performance of individual units rogjel and I don't think I can with the clustering (does this mean factor analysis)?

#### noetsi

##### Fortran must die
Do a QQ-plot of all the estimated parameters. Those who deviates from a straight line will be "real" effects, in contrast to the randomness.
Why is this so GretaGarbo? I have not seen this approach discussed before, do you have a citation or link I could look at on this topic?

Are you suggesting running the k-1 dummies and then using these as parameter estimates?

#### Jake

You could fit a mixed model treating the units as a random effect. In such a model you could throw in whatever covariates you like and examine the distribution of the random unit effects.

#### Miner

##### TS Contributor
Why is this so GretaGarbo? I have not seen this approach discussed before, do you have a citation or link I could look at on this topic?

Are you suggesting running the k-1 dummies and then using these as parameter estimates?
I believe that to which Greta is referring is a variant of the half normal plot analysis. See http://math.uhcl.edu/li/teach/stat5535/halfnormalplot.pdf

Have you considered using ANOM? The null hypothesis for ANOM is that the individual mean is the same as the overall group mean. See https://cran.r-project.org/web/packages/ANOM/vignettes/ANOM.pdf

#### GretaGarbo

##### Human
When I saw Jakes answer in post 6, I thought that, "yes of course, that would be a very good method" (and maybe even the "best"). You can sort of "throw in what ever .... you like" and the method will take care of it.

(If I remember it correctly, for the James and Stein rule to be valid, the groups should be randomly selected. Maybe Noetsis 70 groups can be thought of like that. Then by James &Stein it will decrease the Mean squared error by shrinking towards the mean.)

About the QQ-plots I was thinking of this: If you generate 700 random normal numbers and put them in 70 groups, then the mean of the 70 groups will also be normally distributed and you can have a look at it with a QQ-plot (or a pp-plot or, I believe, with a half normal plot). The 70 number will be on a straight line in the QQ-plot. Most of the random numbers will be close to the mean but some will be larger - but they will be close to a straight line in the QQ-plot. My suggestion is that "real effects" will deviate from the straight line.

But I did not think about if the size of the groups varies. Then the variance of means will be different. I believe that I have heard of methods to correct for that, but I don't remember.

And yes, I was thinking of using QQ-plots like they are used in 2^p factorial designs. (And I believe, but I am not sure, that the half normal plots are used just like the QQ-plots.)

#### noetsi

##### Fortran must die
I have not worked with random effects outside multilevel models, but I will look at that. I have not heard of ANOM at all, but I will certainly look at that

#### rogojel

##### TS Contributor
I have to comment on the performance of individual units rogjel and I don't think I can with the clustering (does this mean factor analysis)?
In this case I think you would need to consider all of them in the model. It would not be necessary to compare each against each though, if the goal is to form some ranking groups. Something like Tukeys HSD could be useful.

regards