# How to calculate sample size to check if 95% of values are between predefined upper and lower limit

#### Yggdrasil

##### New Member
Dear Forum, I am stuck on a sample size estimation question. I looked at the FAQs and the previous questions but I do not think the answers there apply to my scenario (correct me if I am mistaken).

Imagine I want to produce apple pies. My pie only tastes good if the apples have an acid content between 500 and 600 (mg/100g). I told a farmer that I will buy his apples if 95% of them have an acidity within these limits.

I am struggling to calculate the number of apples that I would need to test to have a chance of 80% that at least 95% of the farmer’s apples fulfill my acid content criteria.

My approach was to imagine a hypothetical population of apples with a normally distributed acidity with a mean of 550 (mg/100g) and a standard deviation of 51 (mg/100g). I have selected these values so that 95% of the acidity values lie between 500 and 600 (mg/100g).
I should then be able to test if the farmer’s apples are from the same distribution using Kolmogorov-Smirnov. However, I fail to find a way to predict the necessary sample size for this.

Is there are way to calculate the sample size necessary for a Kolmogorov-Smirnov test?
Does my approach make any sense at all?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
There is a recent paper by ken rothman i t hink on how to calculate power based on a desired confidence interval. I believe someone made a shiny app for it.

Last edited:

#### Dason

##### Ambassador to the humans
Even if you rejected the KS test I don't think that actually answers the question you're trying to get at.

#### Yggdrasil

##### New Member
Thanks for your replies. I agree that a rejected Kolmogorov-Smirnov will not solve my problem. I was wondering if I could proceed like this:

1. I assume that the acidity of apples is normally distributed. This way, 95% of my samples should lie between x̄ +/- 2*σ.

2. I measure the acidity of 10 apples. Based on the measurements, I calculate x̄. Imagine this value would be x̄ = 565

3. I could then calculate σ based on the following formula:
upper_limit - x̄ = 2*σ
600 - 565 = 2*σ
σ = 17.5

4. Maybe I could then try to solve the formula of the standard deviation according to n:
σ = sqrt( sum(from=1 to n) (x - x̄)² / (n-1) )

But again, I am quite sure that this approach is false.

#### Yggdrasil

##### New Member
@hlsmith
Thanks for the tip with the publication.
Did you mean 'Planning Study Size Based on Precision Rather Than Power' from Kenneth Rothman and Sanerc Greenland?
Unfortunately, it is not open access. From what I can see in the abstract, they base their calculation on the Confidence Interval. But this should only give me a 95% chance that my true mean lies within the CI range, not that 95% of the samples are with a certain boundary. Or am I wrong?

#### Dason

##### Ambassador to the humans
If you don't want to make distribution assumptions on the levels themselves you could just treat it as a sample size calculation for a single proportion.

#### Yggdrasil

##### New Member
@katxt: Fantastic! That's exactly what I was looking for.

In case somebody else has the same question:
You need to have an estimate for the mean µ and the standard deviation σ (either from a previous study or calculated on the basis of a preliminary test of maybe 10 to 20 apples. Ideally, your preliminary values follow a normal distribution). Then you can calculate the number of samples for your actual study.

In my case, I had the following values:
estimated mean value = 565
estimated standard deviation = 17.5
1 - alpha = 80%
desired percentage of samples within the tolerance interval = 95%

I entered the data in https://statpages.info/tolintvl.html and played around with the sample number till I reached a two-sided interval of 530.0002 and 599.9998. I would need to test 960 apples.

There is also an R package with a function for this:
library(tolerance)
norm.ss(alpha = 0.2, P = 0.95, side = 2, spec = c(500, 600), method = "DIR", hyper.par = list(mu.0 = 565, sig2.0 = 17.5), m = 1)
# I set 'm = 1' to be consistent with the algorithm from the website
# alpha..P.....delta..P.prime..n
# 0.2....0.95..................
961