Hi

For a project I need to model industrial data. Large part of the dataset consists out of nominal categorical variables (430 variables in total 30 variables are categorical).

The main goal is to use PCR principal component regression.

The problem here is that the categorical variables which aren't ordinal can't be used for PCA methods. This I would wanna solve by using for each category using the mean each category has for the outcome variable.

In this example i use binomial data but in the real dataset some categorical variables have more then 50 categories.

For example Y, X1, X2

450 cat blue
350 dog green
700 cat green
500 dog blue

So I would want to change this dataset into this

450 575 475
350 425 525
699 575 525
500 425 475

Does this make sense to do ? I tried to google, but i couldn't find any information about this.