# Reconcile differences in sample size calculation for two proportions?

#### Buckeye

##### Active Member
Hi,

I'm reading in John Lachin's Biostatistical Methods 2nd edition. It suggests a formula for the sample sizes of a two proportion Z-test. I coded it in R below. The etas are the expected sample fractions in each group, pi1 and pi2 are the proportions of interest, Z_alpha and Z_beta are the quantiles for alpha and beta respectively:
Code:
sample_size <- function(eta1,eta2,pi1,pi2,Z_alpha,Z_beta){

pi<-(eta1*pi1)+(eta2*pi2)
phi0<-sqrt((pi*(1-pi))*((1/eta1)+(1/eta2)))
phi1<-sqrt(((pi1*(1-pi1))/eta1)+((pi2*(1-pi2))/eta2))

res<-(((Z_alpha*phi0)+(Z_beta*phi1))/(pi1-pi2))^2
print(res)

}

sample_size(.5,.5,.28,.4,qnorm(0.975,mean=0,sd=1),qnorm(0.90,mean=0,sd=1))
This is different compared to:
Code:
power.prop.test(n = NULL, p1 = .4, p2 = .28, sig.level = 0.05,power = .9,alternative = c("two.sided"))
This article http://www.stat.ucla.edu/~vlew/stat130/WEEK7/dalgaard9.pdf explains power.prop.test computes a binomial approximation to the normal distribution. When should one method be used over the other? I'm using prop.test which computes a chi-square statistic. I know that if we square a standard normal we get a chi-squared. But, I don't understand why I would use one method over another. Additionally, why does prop.test give the option for a one sided or two sided? Edit: alternative=greater or less is only used when comparing a single proportion against a null value. Makes sense.

Last edited:

#### Dason

You could code up a simulation with the given sample sizes to figure out the power for each result you're getting

#### Buckeye

##### Active Member
I see. I'm noticing a fundamental difference in the calculations. I think each has their own justification. The one I coded seems to be a large sample test with the Z statistic. On the other hand, power.prop.test assumes we are comparing frequencies in a contingency table. In other words, a chi-square test of independence between rows and columns. I'm wondering what the benefit and drawback of each. The power in both examples is 90%. Essentially, if I use a Z test the sample size is 652 whereas if I use a chi-square test the sample size is 326. The chi-square sample size calculation uses a binomial approximation to the normal.
Code:
# N=326
power.prop.test(n = NULL, p1 = .28, p2 = .4,
sig.level = .05,power = .90,strict = TRUE,
alternative = c("two.sided"))

prop.test(x=c(326,91),n=c(326,326),conf.level = .95,alternative = "two.sided")

# N=652
sample_size(.5,.5,.28,.4,qnorm(0.975,mean=0,sd=1),qnorm(0.90,mean=0,sd=1))

Last edited:

#### Buckeye

##### Active Member
Yes, they are. It's because the sample size formulas are different (borrowed from the article above):

#### Buckeye

##### Active Member
Suppose I have infinite time to collect data. Why would I choose one over the other?

#### Dason

I don't know. I don't typically like using these kinds of sample size calculators. Instead I just simulate and use that to estimate power over a range of sample sizes. That way I know the sample size I get will be accurate for the actual analysis I'll be doing.

#### Buckeye

##### Active Member
That's fair. I suppose I can do that and see if the power I get back with the values I set lines up. I guess my question boils down to why run a chi square test as opposed to a Z test. I'll do more reading.

#### fed2

##### Active Member
i check your example against sas proc power, it gives exactly 652, exactly same as your example (apparently example 3.2 from your book.)

proc power;
twosamplefreq test=pchi
groupproportions = (0.4 0.28)
ntotal=.
power=0.9
this uses a 'normal approximation method', according to SAS.

Power analysis is not exactly an exact science. no one agrees within +- 30%, so don't obsess over a few ordinary lives.

#### fed2

##### Active Member
concerning r vs. sas, is one giving npergroup, the other ntotal? the two agree within +- 5 as far as i can tell.

#### Buckeye

##### Active Member
Oh, I didn't notice that. I'll take a closer look. Edit: Yes! The formula I coded is the total N whereas power.prop.test is per group. I thought I was losing my mind. Thanks.

Last edited:

#### fed2

##### Active Member
well at least you had the good sense to cross check against a known result.