1. ## Re: Prediciting accidents

It is the same logic that is used when a binomiöa with a large n (and small p) is approximated by the normal distribution.

2. ## Re: Prediciting accidents

I guess that rgojel means that it is the same as when a binomial with a large n and a small p is approximated with Poisson distribution. (The rule of thumb something like: n*p <7) and when the binomial or Poisson is approximated with a normal when n*p is large (rule of thumb: n*p>5 and n*(1-p) >5 or mu >5 in the Poisson case).

Please note that even if n is large and the response is approximate normal, the variance will still not be constant as it is influenced by p and mu.

The parameter estimates will still be approximately normal as usual in maximum likelihood estimation, provided that the sample is large enough.

3. ## Re: Prediciting accidents

Discussions I have seen of binomial distributions suggest that linear models rarely if ever work with them regardless of N. So what gretagarbo said above makes a lot of sense to me.

4. ## Re: Prediciting accidents

Originally Posted by noetsi
Discussions I have seen of binomial distributions suggest that linear models rarely if ever work with them regardless of N. So what gretagarbo said above makes a lot of sense to me.
Sorry, now I don't really understand what Noetsi means.

When I say a linear model I also include a generalized linear model (glm). And that also includes the logit model (and the probit model and so on).

The logit model is based on the binomial distribution. For example; the number of unhealthy persons among n=1000 exposed persons, versus the number of unhealthy persons among n=1000 unexposed persons. That can be estimated with maximum likelihood on a glm.

If the "p" (probability of unhealthy) is small then the data can be approximated by a Poisson model and one could run a Poisson regression model. A logit model is used a lot!

(And Noetsi, it is very common to use logit and Poisson regression in economics. Also, there is no disagreement about this among statisticians.)

5. ## Re: Prediciting accidents

hi Greta,
IIRC in the case of N going to infinity and p going to zero such that Np is a constant the limiting distribution of the binomial will be the normal with mean Np and some variance involving p ? This ,eans that a distribution of counts can be approximated by a continuous distribution, the Poisson is from this POV less interesting , being a discrete distribution itself.

regards
rogojel

6. ## Re: Prediciting accidents

Possibly a bit over my head but its a very interesting discussion.

So in my case I think multiple linear regression is still out. In Shapiro-Wilk tests of normality 3 out of the 4 data sets i have shows that the p-value was smaller than alpha=.05.

7. ## Re: Prediciting accidents

Maybe I have missed something about the binomial-Poisson-normal approximations. But this is what wikipedia says:

"The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small."

But I guess that what I wrote was not careful enough.

(Sorry, I don't know what "IIRC" or "POW" is.)

The original poster (OP) wrote:
I want to find whether there is a relationship between the number of hazard inspections completed and whether that predicts the amount of accidents that occurs.
If the "workplace" is a lorry/truck company and the "n_i" is the number of trucks in each company and p_i is small, constant and equal for each truck, and each truck accident is independent, then I believe that that binomial distribution can be approximated by a Poisson.

If the workplace have different sizes but if it can't be scaled easily like above with the number of trucks, it could possibly be scaled with an offset variable like the turnover of the company. (Imagine large and small car repair companies.)

- - -

I should also say that if there is "over variation", that the variance is larger than what can be expected from a Poisson model, then sometimes it is possible to model the events with a negative binomial model.

8. ## Re: Prediciting accidents

Hi Greta,
below is a link explaining what I meant, but you get any number of links if you search for "normal approximation binomial". Again, this is interestng because it is moving between count and measurement data, showing that under the right circumstances counts can be closely approximated by continuous data, sortof justifying what noetsi meant.

https://surfstat.anu.edu.au/surfstat-home/3-2-8.html

BTW IIRC is If I Recall Correctly, POV simply Poin Of View and BTW By the Way - sorry for using dated programmer slang, I never know who is familiar with it.

regards
rogojel