PDA

View Full Version : Biassing according to sample size



Ralph Lucas
12-03-2011, 07:16 AM
I have a universe of points whose mean and median values are 50, and which has a broadly bell-shaped distribution of values between 0 and 100.

Local areas of the universe have means well away from 50.

In each area of the universe, comprising of 1,000 points or so, I have samples of between 1 and 100 points.

I want to produce a map, as best I can, of the known variation of local means within the universe. Unknown areas to be represented as having a value of 50.

Where I have one sample point, the right value for the map is clearly 50 - I have no evidence otherwise. When I have 100 sample points the probability that the sample average is the local average is high enough, given the nature of the measurements, to be assumed to be unity.

It would suit me to be able to bias the measured average of intermediate sample sizes towards the mean, depending on the sample size. I realise that in doing this I lose the ability to discriminate between an area that has a measured value of 52 and 99 data points and one that has a measured value of 70 but only 3 data points, but that suits my purposes well enough. I am only interested in mapping proven variation.

There are plenty of functions with the right sort of shape to use for biassing, but is there one that has some roots in statistical theory, and would thus be fairer than my choosing a shape based on gut feel?

BGM
12-03-2011, 11:04 AM
You want to have a Bayes estimate? Such estimate usually gives weights not solely to the sample mean, but also to the prior mean as well. This is also called the credibility estimate in actuarial science.

Ralph Lucas
12-06-2011, 06:20 AM
Thankyou. It helps to know the code words! I have spent an interesting couple of hours exploring Bayes estimates and credibility - and am much comforted by how much suck-finger-and-stick-in-airing seems to go on in serious statistics.

For my purposes I think a logistics curve will work best - 1/(1+e^-x) - because I do not want to give samples below 5 much weight at all, and I do not want the largest samples to be too dominant. A bit of statistics, more of artistics.