- Thread starter leo nidas
- Start date

Hi there,

Can anyone tell me how to produce overdispersed poisson data? An algorithm or something? I find it difficult.

Thanx in advance for any help!

Can anyone tell me how to produce overdispersed poisson data? An algorithm or something? I find it difficult.

Thanx in advance for any help!

http://en.wikipedia.org/wiki/Negative_binomial_distribution

The negative binomial distribution with size = n and prob = p has density

p(x) = Gamma(x+n)/(Gamma(n) x!) p^n (1-p)^x

also often used with 'the dispersion parameter', where prob (p) = size/(size+mu). Where mu is the equivalent of lambda (in a Poisson dist) or 'the expected value'.

If you keep the n parameter low it should be far more 'dispersed' than a Poisson. With large n p(x) becomes a Poisson.

I am afraid I don't completely get it though.

My first goal was something simple. To produce some values from a poisson model (no overdispersion) and then estimate the parameters to see if I get it right so I did:

Step 1: Set b0=5 , b1=-0.5

Step 2: Procuce 1000 random number for Exponential with mean 3. (X)

Step 3: For i=1:1000 produce y(i)=Poisson(exp(b0+b1*x(i))). (So we have 1000 y's)

Step 4: With the data X,Y apply poisson regression and see that estimates are good (near 5 and -0.5).

Everything is fine above.

Now if I wanted to produce overdispersed data should I:

Step 1: Set b0=5 , b1=-0.5, dispersion parameter let φ=2.

Step 2: Procuce 1000 random number for Exponential with mean 3. (X)

Step 3: For i=1:1000 produce y(i)=?? (So we have 1000 y's)

y(i) should be produced from the negative binomial with what parameters in order to get let's say dispersed data with dispersion parameter 2? I want to create it and then see if I can estimate.

Thanks again for any help.

I just read the original post and thought "the easiest way to produce over-dispersed data is to double up on the random process". Intuitively having a random process followed by another random process on the result (but not modeling that) inserts more variation then you should expect. It is worth knowing because while I have limited experience with the topic I suspect its a common source of over-dispersion in real life.

I wrote the code to illustrate:

Code:

```
nsims = 5000
X1 = numeric(nsims)
X2 = numeric(nsims)
X3 = numeric(nsims)
for (i in 1:nsims){
x = 5*runif(100) + 1
y1 = sapply(x,function(xi) rpois(1,xi))
y2 = sapply(y1,function(xi) rpois(1,xi))
y3 = sapply(y2,function(xi) rpois(1,xi))
fit1 = glm(y1 ~ x, family=poisson)
X1[i] = fit1$dev
fit2 = glm(y2 ~ x, family=poisson)
X2[i] = fit2$dev
fit3 = glm(y3 ~ x, family=poisson)
X3[i] = fit3$dev
}
mean(X1) #109 for me
mean(X2) #208 for me
mean(X3) #298 for me
```

I am afraid I don't completely get it though.

My first goal was something simple. To produce some values from a poisson model (no overdispersion) and then estimate the parameters to see if I get it right so I did:

Step 1: Set b0=5 , b1=-0.5

Step 2: Procuce 1000 random number for Exponential with mean 3. (X)

Step 3: For i=1:1000 produce y(i)=Poisson(exp(b0+b1*x(i))). (So we have 1000 y's)

Step 4: With the data X,Y apply poisson regression and see that estimates are good (near 5 and -0.5).

Everything is fine above.

Now if I wanted to produce overdispersed data should I:

Step 1: Set b0=5 , b1=-0.5, dispersion parameter let φ=2.

Step 2: Procuce 1000 random number for Exponential with mean 3. (X)

Step 3: For i=1:1000 produce y(i)=?? (So we have 1000 y's)

y(i) should be produced from the negative binomial with what parameters in order to get let's say dispersed data with dispersion parameter 2? I want to create it and then see if I can estimate.

Thanks again for any help.