# Thread: Sample size calculation; negative binomial distribution

1. ## Sample size calculation; negative binomial distribution

I am interested in doing a sample size calculation for exacerbation rates. I am drawing my data from a negative binomial distribution and my test statistics is based on glm on negative binomial. Currently i have a code for power calculation for a given sample size. How do i modify this code to calculate sample size.
The following are the input parameters for the sample size calculation
Alpha level = 0.04, target power= 90%, event rate for reference group (r0)= mu0<-0.8/52*48,
Rate ratio (rr) under ha = mu1/mu0 where mu1<-mu0*0.6, average exposure time= 48 week, sample size allocation ratio =1 so each group has equal sample size ;n1=n0), and dispersion parameter (k=0.4).
Code:
``````library(MASS)

n<- 400 sample size
alpha<-0.04
int<-1000
k<- 1/0.4 #shape
mu0<-0.8/52*50 #control exacerbation rate
mu1<-mu0*0.4 # exacerbation rate for trt1
mu2<-mu1 #exacerbation rate for trt2

t0<-rep(0,n)
t1<-rep(1,n)
ta1<-c(t0,t1)
ta2<-c(t0,t1)
c1<-NULL
c2<-c1
test1<-c1
test2<-c1

for (i in 1:int){

pbo<-rnegbin(n,theta=k, mu=mu0)
d1<-rnegbin(n, theta=k, mu=mu1)
d2<-rnegbin(n, theta=k, mu=mu2)
r1<-c(pbo,d1)
r2<-c(pbo,d2)

x1<-cbind(ta1,r1)
x2<-cbind(ta2,r2)

dt1<-as.data.frame(x1)
dt2<-as.data.frame(x2)

fm1 <- glm.nb(r1 ~ ta1, data = dt1)
fm2 <- glm.nb(r2 ~ ta2, data = dt2)

test1<-c(test1,summary(fm1)\$coef[2,4])
test2<-c(test2,summary(fm2)\$coef[2,4])

c1<-c(c1,summary(fm1)\$coef[2,3])
c2<-c(c2,summary(fm2)\$coef[2,3])

}

#power
result<-(sum((test1<alpha)&(test2<alpha))/int)``````

2. ## Re: Sample size calculation; negative binomial distribution

The simplest algorithm if you already have a way to calculate power given a sample size is to take a guess at the optimal sample size and calculate the power for that. If the power was too low then increase the sample size and try again. If the power was too high then decrease the sample size and try again. Eventually you'll converge to the required sample size.

3. ## The Following User Says Thank You to Dason For This Useful Post:

biobee (04-08-2014)

4. ## Re: Sample size calculation; negative binomial distribution

This paper should be helpful (with the methodology, not the code).
http://www.ncbi.nlm.nih.gov/pubmed/24038204

T. Ryan

5. ## Re: Sample size calculation; negative binomial distribution

Hi Dason,

Thanks for the suggestion. I have implemented your suggestion. However I am looking for a more accurate way to estimate sample size. Do you have any suggestions to modify my code to get n?

Thanks

Originally Posted by Dason
The simplest algorithm if you already have a way to calculate power given a sample size is to take a guess at the optimal sample size and calculate the power for that. If the power was too low then increase the sample size and try again. If the power was too high then decrease the sample size and try again. Eventually you'll converge to the required sample size.

6. ## Re: Sample size calculation; negative binomial distribution

What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?

7. ## Re: Sample size calculation; negative binomial distribution

Dason,

The method to obtain sample size is not very accurate as I am giving a ball park estimate of the sample sizes. Below is my updated code. Also I have a range of shape and rate parameters to play with. So now there are 3 loops I need to run to get power.

I was thinking that perhaps this formula: n0>= (Zalpha/2Sqrt(V0)+ZbetaSqrt(V1))^2/(log r1/r0)^2
See journal article (equation (9)):

may be a good approach for me to get sample size, however I donot know how I should get the estimates for V0 and V1 (variances for the two groups) from the glm test statistic that I am running for my power analysis.

Code:
``````library(MASS)

p_rate <- c(0.8,0.9)
shape <- c(0.5,0.6)
n <- c(100,150,200,250,300,350,400,450,500,550)

alpha<-0.04
int<-1000

result <- matrix(nrow=length(shape),ncol=length(p_rate))
sample <- list()
names(sample) <- c(100,150,200,250,300,350,400,450,500,550)

for (m in 1: length(n)){
for(j in 1:length(p_rate)){
for(l in 1:length(shape)){

#cat ("starting",l,j,p_rate[j],shape[l])

k<- 1/shape[l] #shape
mu0<-p_rate[j]/52*48 #placebo exacerbation rate
mu1<-mu0*0.6 # exacerbation rate for dose 1
mu2<-mu1 #exacerbation rate for dose 2

cat ("starting",l,j,k,m,mu0,"\n")

t0<-rep(0,n[m])
t1<-rep(1,n[m])
ta1<-c(t0,t1)
ta2<-c(t0,t1)
c1<-NULL
c2<-c1
test1<-c1
test2<-c1

set.seed(123)

for (i in 1:int){

pbo<-rnegbin(n[m],theta=k, mu=mu0)
d1<-rnegbin(n[m], theta=k, mu=mu1)
d2<-rnegbin(n[m], theta=k, mu=mu2)
r1<-c(pbo,d1)
r2<-c(pbo,d2)

x1<-cbind(ta1,r1)
x2<-cbind(ta2,r2)

dt1<-as.data.frame(x1)
dt2<-as.data.frame(x2)

fm1 <- glm.nb(r1 ~ ta1, data = dt1)
fm2 <- glm.nb(r2 ~ ta2, data = dt2)

test1<-c(test1,summary(fm1)\$coef[2,4])
test2<-c(test2,summary(fm2)\$coef[2,4])

c1<-c(c1,summary(fm1)\$coef[2,3])
c2<-c(c2,summary(fm2)\$coef[2,3])
}

#power
cat("done",l,j,m)

result[l,j]<-(sum((test1<alpha)&(test2<alpha))/int)
sample[[m]]<- result

}

}

}

tmp <- Sys.time()
time<- as.POSIXlt(tmp)
time``````
Originally Posted by Dason
What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?

Originally Posted by Dason
What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts