I'm working with a very large dataset of individuals from 6 different groups/samples. The size of these groups is radically different. The largest of the 6 groups makes up 83.7% of the data, while the smallest is .2% of the entire sample. (That group is 6537 records, so it's not insignificant.)
I have a lot of analysis I need to do, but right now it seems that the large group is swamping the other groups. What alternatives exist for normalizing the data so that no single group overwhelms the analysis?





Reply With Quote

