Computing Mean and Confidence Interval for an Unbalance Dataset

Hello everyone,

I am working on a project in which I am comparing a specific physiological value for a large set of species (75 Species in Total).

One of the things that I wanted to do was to compute the mean and confidence interval of the data of all the species together, to show all the data are actually centred around a specific value ( we can definitely see it when plotting the distribution).

However, my dataset is highly unbalanced, and I have species with only 5 available datapoints, and others with up to 150 datapoints. Therefore I wonder how I could calculate this mean with all the species having the same "weight" in the calculation (because obviously, the species with 150 datapoints will influence the mean way more than these having only five).

I thought about doing a mean of the mean of each species, but I kind of remember that it is not really a good thing to do mean of means.

Thank you already for your help :D



Active Member
So you have a bunch of values each on their own species axis, that all contribute to a desired mean value. And you've noticed they aren't delivering equal amounts toward that mean.

Check out Principal Component Analysis for methods of determining both a set of axes that best describe the difference and similarities, and a way to reduce the dimensions down to the top most influential with the least loss of detail.