# Correct assessment of the sampling distribution of the mean for my experiment?

#### lnhstats

##### New Member
Hi, all. I'm trying to properly analyze data from an experiment and could use some help bridging the mental gap between statistical theory (which I understand the basics of) and what I did in the laboratory.

I have a bulk amount of material and am testing it for the concentration of a compound. During analysis I take a sample of size n=6 objects ("object" being a small amount of the total mass of the material) from the bulk and run an assay to quantify the concentration of the compound in each of the 6 objects. Each object is treated following the instructions for the assay, and at the end is measured 3 times. The purpose of measuring the same object 3 times is to get a feel for instrument/scientist variation. So, each of the 6 objects generates an average measurement from the triplicate readings, and then an average concentration is generated from the 6 objects in the sample.

My colleagues believe that the 6 object measurements constitute a sampling distribution of the mean ("mean" being the average compound concentration in the bulk material), since each object itself has an average measurement from the triplicate readings. My understanding is that this is not so-- the triplicate readings are simply a way to assess the extent of experimental error, and the average of the triplicate readings should be treated as the true measurement. If this is the case, the average of the six measurements is simply one datapoint in the sampling distribution of the mean. In order to generate a true sampling distribution of the mean, this protocol would have to be repeated numerous times to generate more samples of 6 objects and an average measurement for each sample.

Is this an accurate thought process? I'd like to make sure I'm parsing the difference between replicate measurements of a given object and measurements of different objects in a sample correctly. Thank you!

#### Miner

##### TS Contributor
I would agree that averaging the three measurements per object is simply used to minimize the effect of measurement variation. Therefore, the six objects constitutes a sample of size n=6 of the population (bulk material). Whether this is adequate for assessing the population mean would depend on the homogeneity of the bulk material.

#### lnhstats

##### New Member
Great, thanks for that confirmation. Now, as you said, whether or not 6 objects is an appropriate sample size to estimate the population mean would depend on the homogeneity of the bulk material. Would it be appropriate for me to calculate the standard deviation and sample average, and then construct a confidence interval (say, a 95% confidence interval), and state that the population mean is within ___ and ___ with ___ confidence? If so, would it be more appropriate to use the uncorrected standard deviation (divided by N) or the corrected standard deviation (divided by N - 1)?

Thank you again for your help!

#### Miner

##### TS Contributor
Would it be appropriate for me to calculate the standard deviation and sample average, and then construct a confidence interval (say, a 95% confidence interval), and state that the population mean is within ___ and ___ with ___ confidence?
Yes.

If so, would it be more appropriate to use the uncorrected standard deviation (divided by N) or the corrected standard deviation (divided by N - 1)?
Use the corrected standard deviation for sample statistics. The uncorrected standard deviation is only for population parameters.

#### lnhstats

##### New Member
Thank you for your help there!

I do have another question in a similar vein. Let's say that the point of my experiment is to determine whether or not a particular batch of bulk material is below a certain concentration of the compound. In other words, I have a threshold compound concentration that the bulk material's average concentration should not exceed. How would one most accurately determine what sample size is appropriate? What's currently done is: the average of the n=6 objects in the sample is calculated, along with the standard deviation, and a 95% confidence interval is constructed. If the limits of the confidence interval do not cross the threshold, then the bulk material from which the sample was drawn "passes" (because, with 95% confidence, the average concentration in the bulk material does not exceed the threshold).

How, though, can I perform an accurate assessment of the necessary sample size? Most of what I've seen about calculating sample sizes involves comparing two averages. How do I compare some measured parameter against a nominal value for the purposes of sample size determination?

#### Miner

##### TS Contributor
I would calculate the sample size for a 1-sample t-test using a specified difference, power and within-batch standard deviation.

#### BGM

##### TS Contributor
Not to confuse you but you may consider the following model:

$$X_{ij} = \mu + \tau_i + \epsilon_{ij}, i = 1,2, \ldots, 6; j = 1, 2, 3$$

where

$$X_{ij}$$ are the measurements,

$$\mu$$ is the true population mean

$$\tau_i$$ is a mean-zero random error, representing the variations across different objects

$$\epsilon_{ij}$$ is a mean-zero random error, representing the measurement errors.

#### Miner

##### TS Contributor
If the t-statistic for a 1-sample t-test is $$t=\frac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}$$, where $$\bar{x}$$ is the sample mean, and $$\mu_0$$ is the hypothesized population mean, then (if my math works okay):

$$n=(\frac{s*t}{\bar{x}-\mu_0})^2$$

For practical application, $$\bar{x}-\mu_0$$ may be replaced by $$\delta$$, which represents the difference that you wish to detect giving:

$$n=(\frac{s*t}{\delta})^2$$

Last edited: