Statistics regarding distances from group center

SGK

New Member
#1
Hi

I am an amateur trying to recall high school/college statistics. I'm looking at shooting group analysis. I can identify the center of the group and consider the shots to represent random samples of variances from this center. I can calculate for each sample shot the distance in mm from this sample center. I can calculate the extreme spread of these. Can I calculate the mean and standard deviation in the same, conventional manner?

I'm getting confused because I'm used to only what I will call one dimensional statistics. Velocity data would be an example of this. Each shot has just one velocity number. I can calculate sample mean, SD, ES etc and confidence intervals in these. I can use a t-test to see if I can be confident there's likely a real chance that two sample sets represent different underlying behaviors.

If I think about the cluster of observations on a 2D plane, I think of the 'mean' is the center of the group with sample variances from mean in both distance and angle. The distance from center is almost the first derivative of an (x,y) location on the 2D plane versus the center (a distance ignoring angle). (I think things simplify if we make the assumption that angular variance around the center is random but I'm not sure.)

My goal is to get to confidence intervals and t-test analysis (which I already use on the velocity data). Which group has the smallest variance from center? What's the 95% (or other) confidence interval in this deviation? Can I be x% confident that sample group A represents a different underlying population behavior than sample group B?

I don't care that there might be a shift in the center of each. (This can easily be corrected for by shifting the point of aim.) My only concern is the relative variances/dispersion between the two.

I my description of this makes sense! I appreciate any help.

Regards

Steve
 

katxt

Well-Known Member
#2
This is not an easy problem, Steve. The t test approach needs the distances to the centre to be normal which they won't be.
How big are the groups you are comparing?
A theoretical approach is possible if you can assume that your calculated centre is in fact the true centre, and that the shots are bivariate normal (normal in both directions). It involves the chi squared and F distributions. Let me know if this interests you.
My own preference, as I understand the problem at the moment, would be a bootstrap test for the difference in the two spreads. This can give a p value and a CI. kat
 

SGK

New Member
#3
Hi. Thanks for the response.

The groups are less than 1 inch square, often half that. Sample size from 5 and up to, say, 20 or more.

I was thinking about it some more. Given I don't care about a shift in the (mean of x, mean of y) center, maybe what I'm sampling is simply the lengths or radii from this sample center. The true center may be different but as n increases the sample group center approaches that the true (population) center. I simply want to see what sample suggests less variance in radii. The first lot of samples might have, for example, more powder in the cartridge case than the second set of samples. Sure, the center of impact might shift but I'm not worried about that. I'm only focusing on whether the variance from center was less. If I am just sampling radii then things get simpler, no? I can calculate the mean, SD, ES etc etc.

But what concerns me with this thinking - and it's perhaps what you are alluding to - is that my calculated radii are a derivative of the sample mean center which may not be right. The radii I'm recording aren't pure samples of the population behavior.

"The t test approach needs the distances to the centre to be normal which they won't be."

They might be normal in 3D when plotting the distribution of (angle, distance from center). By this I mean more shots coalescing towards the center and less, in all directions, away from center. Rather than a classic 2D normal distribution sketch, it looks more like a volcano in 3D. (I'm probably not conveying what I mean very well.)
 

SGK

New Member
#4
When I first started thinking about this I considered myself to be collecting paired sample data (x,y) coordinates on a flat plane. Normally one would be interested in correlation in a paired sample test, calculate a regression etc. But here there would be no x/y correlation and I'm interested in the rate of dispersion. I could calculate the sample centre but, depending on the dispersion, I could only have confidence that the true center was within a circle of this sample center. Then when I tried to think about a confidence interval on the dispersion I was lost.
 

katxt

Well-Known Member
#5
They might be normal in 3D when plotting the distribution of (angle, distance from center). By this I mean more shots coalescing towards the center and less, in all directions, away from center. Rather than a classic 2D normal distribution sketch, it looks more like a volcano in 3D. (I'm probably not conveying what I mean very well.)
The distance may well be bivariate normal. In other words if you squash them down onto a line from any direction they make a normal curve. However, the distances from the centre are always positive. There will be a lump near 0 and a long tail.
 

SGK

New Member
#6
The distance may well be bivariate normal. In other words if you squash them down onto a line from any direction they make a normal curve. However, the distances from the centre are always positive. There will be a lump near 0 and a long tail.
Coordinates would have negative values as well and hence while the lengths are all positive, for any given angle (let's say through 45 degrees and 225 degrees) there would be bivariate normality (or one could assume so).
 

katxt

Well-Known Member
#7
That's right. Squashed from any angle would be normal. But the problem is how to use that.
Here's another approach. Make your own table.
Simulate say n = 5 normal pairs. Find the centre. Find the length of each distance from the calculated centre. Average them. Repeat 10000 times. Find the mean of all the means and then the SE (SD) of those means.
Write down the pair n = 5 with its %SE of the mean.
Repeat for n= 10, 15, and 20 to make a small table. Don't lose the table.
[A quick calculation gave me for n = 10, SE is about 17% of the mean.]
Now to use it. Get your two spreads. Find the centres and distances. Find the mean distance for each spread. Find the difference in the means. Use your new table to calculate the SE of each mean. The SE of the difference in the means is sqrt(SE1^1+SE2^2). The margin of error MoE for the difference is near enough to 2*SE difference.
Does 0 lie within the MoE? No - a sig diff. Yes - no difference established. kat
 

Karabiner

TS Contributor
#8
I am an amateur trying to recall high school/college statistics. I'm looking at shooting group analysis. I can identify the center of the group and consider the shots to represent random samples of variances from this center. I can calculate for each sample shot the distance in mm from this sample center. I can calculate the extreme spread of these. Can I calculate the mean and standard deviation in the same, conventional manner?
If you have a set of distances, you can calculate all the descriptive statistics (mean distance, median distance, variance of the distances etc.),
and you can compare these with those from other sets. But if you want to use statistical tests, then the problem is that the observations are not independent. The distance from a center for any observed subject depends on the values of the other subjects.

With kind regards

Karabiner
 

katxt

Well-Known Member
#9
the problem is that the observations are not independent. The distance from a center for any observed subject depends on the values of the other subjects.
Very true. This is the advantage of using a simulation to find the distribution of any statistic you may use. It automatically incorporates the natural dependence into the distribution.
 

katxt

Well-Known Member
#12
Results of a little experiment comparing two groups of 5. Runs of 10000.
t test unequal variance. alpha = 7%. power to detect one group 3x the other 60%
Mann-Whitney test. alpha = 8%. power to detect one group 3x the other 70%
CI test outlined above using SE = 25% of the mean (got from simulation). alpha = 5%. power to detect one group 3x the other 80%.
For the CI test, a rule of thumb is that SE = 55%/sqrt(n) x mean.
 

katxt

Well-Known Member
#14
You did, fed2, and that's fine. We were all just guessing anyway but now we know. It probably doesn't matter much because I think SGK was hoping to reliably distinguish between two groups much close than 3 times the spread.
 

SGK

New Member
#15
Catching up here

But if you want to use statistical tests, then the problem is that the observations are not independent. The distance from a center for any observed subject depends on the values of the other subjects.
I think you're saying what we admitted to above. Each shot is completely independent of the prior and following but they all together determine centre and this sample centre may not necessarily remain the same as more shots are fired (and we get better at determining population centre). From one sample set to the next they can be independent if there is a difference in population behavior. If, for example, I place a significant amount more gunpowder in the second set of cases vs the first I'd expect a new centre on target. The question is not whether the centre is different (I'd expect it to be different in almost all cases - in fact I can't imagine why it would be different unless everything from one sample to the next was identical). The question is whether the level of dispersion around each is materially different. (If more gunpowder led to less dispersion, ie greater accuracy, I'd shoot that and adjust the point of aim.)

katxt, I need to understand better your last couple of posts. I've not had time to focus on this over the last few days. What I wanted to do before responding was to run an actual example to see how much the sample center moved with n number of shots in the sample and the impact on calculated radii. I suspect it doesn't take many shots for the centre to be reasonably accurate and the impact on radii to be little. Maybe then there is some simplification by assuming that if n>x we assume the sample centre is the population centre...?
 

katxt

Well-Known Member
#16
I don't know of any formula that gives you the accuracy of the sample centre compared with the true centre with bivariate normal data. I do know that if there is one it will involve 1/sqrt(n) so a sample of 20 will have a centre uncertainty of half that of a sample of 5.
The question is whether the level of dispersion around each is materially different.
You will have a hard time ahead of you if you want to show that one set has, say, 50% greater dispersion than another. Even with groups of 20, you will only get a positive result less than 2/3 of the time.