I have a dataset with several categorical variables (policy characteristics) and one numerical variable (historical losses). What I want to do is choose values for the categorical variables that cluster the numerical variable into 5 groups.

I have considered using regression trees because it is the only method that I have seen that divides the categorical variables to reach an optimal grouping. Am I on the right track?

If it matters, I am working in R, but I also use SAS.

Thanks,

Chris