Reconcile differences in sample size calculation for two proportions?

#1
Hi,

I'm reading in John Lachin's Biostatistical Methods 2nd edition. It suggests a formula for the sample sizes of a two proportion Z-test. I coded it in R below. The etas are the expected sample fractions in each group, pi1 and pi2 are the proportions of interest, Z_alpha and Z_beta are the quantiles for alpha and beta respectively:
Code:
sample_size <- function(eta1,eta2,pi1,pi2,Z_alpha,Z_beta){

  pi<-(eta1*pi1)+(eta2*pi2)
  phi0<-sqrt((pi*(1-pi))*((1/eta1)+(1/eta2)))
  phi1<-sqrt(((pi1*(1-pi1))/eta1)+((pi2*(1-pi2))/eta2))

  res<-(((Z_alpha*phi0)+(Z_beta*phi1))/(pi1-pi2))^2
  print(res)

}

sample_size(.5,.5,.28,.4,qnorm(0.975,mean=0,sd=1),qnorm(0.90,mean=0,sd=1))
This is different compared to:
Code:
power.prop.test(n = NULL, p1 = .4, p2 = .28, sig.level = 0.05,power = .9,alternative = c("two.sided"))
This article http://www.stat.ucla.edu/~vlew/stat130/WEEK7/dalgaard9.pdf explains power.prop.test computes a binomial approximation to the normal distribution. When should one method be used over the other? I'm using prop.test which computes a chi-square statistic. I know that if we square a standard normal we get a chi-squared. But, I don't understand why I would use one method over another. Additionally, why does prop.test give the option for a one sided or two sided? Edit: alternative=greater or less is only used when comparing a single proportion against a null value. Makes sense.
 
Last edited:

Dason

Ambassador to the humans
#2
You could code up a simulation with the given sample sizes to figure out the power for each result you're getting
 
#3
I see. I'm noticing a fundamental difference in the calculations. I think each has their own justification. The one I coded seems to be a large sample test with the Z statistic. On the other hand, power.prop.test assumes we are comparing frequencies in a contingency table. In other words, a chi-square test of independence between rows and columns. I'm wondering what the benefit and drawback of each. The power in both examples is 90%. Essentially, if I use a Z test the sample size is 652 whereas if I use a chi-square test the sample size is 326. The chi-square sample size calculation uses a binomial approximation to the normal.
Code:
# N=326
power.prop.test(n = NULL, p1 = .28, p2 = .4,
                sig.level = .05,power = .90,strict = TRUE,
                alternative = c("two.sided"))

prop.test(x=c(326,91),n=c(326,326),conf.level = .95,alternative = "two.sided")

# N=652
sample_size(.5,.5,.28,.4,qnorm(0.975,mean=0,sd=1),qnorm(0.90,mean=0,sd=1))
 
Last edited:

Dason

Ambassador to the humans
#7
I don't know. I don't typically like using these kinds of sample size calculators. Instead I just simulate and use that to estimate power over a range of sample sizes. That way I know the sample size I get will be accurate for the actual analysis I'll be doing.
 
#8
That's fair. I suppose I can do that and see if the power I get back with the values I set lines up. I guess my question boils down to why run a chi square test as opposed to a Z test. I'll do more reading.
 

fed2

Active Member
#9
i check your example against sas proc power, it gives exactly 652, exactly same as your example (apparently example 3.2 from your book.)

proc power;
twosamplefreq test=pchi
groupproportions = (0.4 0.28)
ntotal=.
power=0.9
this uses a 'normal approximation method', according to SAS.

Power analysis is not exactly an exact science. no one agrees within +- 30%, so don't obsess over a few ordinary lives.
 
#11
Oh, I didn't notice that. I'll take a closer look. Edit: Yes! The formula I coded is the total N whereas power.prop.test is per group. I thought I was losing my mind. Thanks.
 
Last edited: