# Sample size calculation; negative binomial distribution

#### biobee

##### New Member
I am interested in doing a sample size calculation for exacerbation rates. I am drawing my data from a negative binomial distribution and my test statistics is based on glm on negative binomial. Currently i have a code for power calculation for a given sample size. How do i modify this code to calculate sample size.
The following are the input parameters for the sample size calculation
Alpha level = 0.04, target power= 90%, event rate for reference group (r0)= mu0<-0.8/52*48,
Rate ratio (rr) under ha = mu1/mu0 where mu1<-mu0*0.6, average exposure time= 48 week, sample size allocation ratio =1 so each group has equal sample size ;n1=n0), and dispersion parameter (k=0.4).
Code:
``````library(MASS)

n<- 400 sample size
alpha<-0.04
int<-1000
k<- 1/0.4 #shape
mu0<-0.8/52*50 #control exacerbation rate
mu1<-mu0*0.4 # exacerbation rate for trt1
mu2<-mu1 #exacerbation rate for trt2

t0<-rep(0,n)
t1<-rep(1,n)
ta1<-c(t0,t1)
ta2<-c(t0,t1)
c1<-NULL
c2<-c1
test1<-c1
test2<-c1

for (i in 1:int){

pbo<-rnegbin(n,theta=k, mu=mu0)
d1<-rnegbin(n, theta=k, mu=mu1)
d2<-rnegbin(n, theta=k, mu=mu2)
r1<-c(pbo,d1)
r2<-c(pbo,d2)

x1<-cbind(ta1,r1)
x2<-cbind(ta2,r2)

dt1<-as.data.frame(x1)
dt2<-as.data.frame(x2)

fm1 <- glm.nb(r1 ~ ta1, data = dt1)
fm2 <- glm.nb(r2 ~ ta2, data = dt2)

test1<-c(test1,summary(fm1)\$coef[2,4])
test2<-c(test2,summary(fm2)\$coef[2,4])

c1<-c(c1,summary(fm1)\$coef[2,3])
c2<-c(c2,summary(fm2)\$coef[2,3])

}

#power
result<-(sum((test1<alpha)&(test2<alpha))/int)``````

#### Dason

The simplest algorithm if you already have a way to calculate power given a sample size is to take a guess at the optimal sample size and calculate the power for that. If the power was too low then increase the sample size and try again. If the power was too high then decrease the sample size and try again. Eventually you'll converge to the required sample size.

#### biobee

##### New Member
Hi Dason,

Thanks for the suggestion. I have implemented your suggestion. However I am looking for a more accurate way to estimate sample size. Do you have any suggestions to modify my code to get n?

Thanks

The simplest algorithm if you already have a way to calculate power given a sample size is to take a guess at the optimal sample size and calculate the power for that. If the power was too low then increase the sample size and try again. If the power was too high then decrease the sample size and try again. Eventually you'll converge to the required sample size.

#### Dason

What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?

#### biobee

##### New Member
Dason,

The method to obtain sample size is not very accurate as I am giving a ball park estimate of the sample sizes. Below is my updated code. Also I have a range of shape and rate parameters to play with. So now there are 3 loops I need to run to get power.

I was thinking that perhaps this formula: n0>= (Zalpha/2Sqrt(V0)+ZbetaSqrt(V1))^2/(log r1/r0)^2
See journal article (equation (9)):

may be a good approach for me to get sample size, however I donot know how I should get the estimates for V0 and V1 (variances for the two groups) from the glm test statistic that I am running for my power analysis.

Code:
``````library(MASS)

p_rate <- c(0.8,0.9)
shape <- c(0.5,0.6)
n <- c(100,150,200,250,300,350,400,450,500,550)

alpha<-0.04
int<-1000

result <- matrix(nrow=length(shape),ncol=length(p_rate))
sample <- list()
names(sample) <- c(100,150,200,250,300,350,400,450,500,550)

for (m in 1: length(n)){
for(j in 1:length(p_rate)){
for(l in 1:length(shape)){

#cat ("starting",l,j,p_rate[j],shape[l])

k<- 1/shape[l] #shape
mu0<-p_rate[j]/52*48 #placebo exacerbation rate
mu1<-mu0*0.6 # exacerbation rate for dose 1
mu2<-mu1 #exacerbation rate for dose 2

cat ("starting",l,j,k,m,mu0,"\n")

t0<-rep(0,n[m])
t1<-rep(1,n[m])
ta1<-c(t0,t1)
ta2<-c(t0,t1)
c1<-NULL
c2<-c1
test1<-c1
test2<-c1

set.seed(123)

for (i in 1:int){

pbo<-rnegbin(n[m],theta=k, mu=mu0)
d1<-rnegbin(n[m], theta=k, mu=mu1)
d2<-rnegbin(n[m], theta=k, mu=mu2)
r1<-c(pbo,d1)
r2<-c(pbo,d2)

x1<-cbind(ta1,r1)
x2<-cbind(ta2,r2)

dt1<-as.data.frame(x1)
dt2<-as.data.frame(x2)

fm1 <- glm.nb(r1 ~ ta1, data = dt1)
fm2 <- glm.nb(r2 ~ ta2, data = dt2)

test1<-c(test1,summary(fm1)\$coef[2,4])
test2<-c(test2,summary(fm2)\$coef[2,4])

c1<-c(c1,summary(fm1)\$coef[2,3])
c2<-c(c2,summary(fm2)\$coef[2,3])
}

#power
cat("done",l,j,m)

result[l,j]<-(sum((test1<alpha)&(test2<alpha))/int)
sample[[m]]<- result

}

}

}

tmp <- Sys.time()
time<- as.POSIXlt(tmp)
time``````
What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?

What do you mean by 'more accurate way to estimate sample size'? The method should be fine in terms of accuracy. Are you saying you want a faster method? Or is your method to obtain power not very accurate?