Note I do not know if the author's interpretation are correct. But he is a statistician who wrote a book for SAS so I am guessing this is reasonable.

http://blogs.sas.com/content/iml/2012/04/04/fitting-a-poisson-distribution-to-data-in-sas/

Note I do not know if the author's interpretation are correct. But he is a statistician who wrote a book for SAS so I am guessing this is reasonable.

http://blogs.sas.com/content/iml/2012/04/04/fitting-a-poisson-distribution-to-data-in-sas/

Put another way I want to know if a data base [in this case randomly created] actually is the same as you would expect from a poisson distribution. This is what the link shows you how to do [assuming it is correct of course].

I can't send you the data base of course, I generated a random sample with a poisson distribution with sas code [having 500 cases]. If you want the SAS code to generate that data base I could send it, but I assume you would run it in r instead.

Code:

```
[FONT=Courier New][SIZE=2][COLOR=#000080]
[SIZE=2][FONT=Courier New][COLOR=#000080][B][SIZE=2][FONT=Courier New][COLOR=#000080]proc [/COLOR][/FONT][/SIZE][/B][/COLOR][/FONT][/SIZE][/COLOR][/SIZE][/FONT][B][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080]means [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/B][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]data[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2]=mydata [/SIZE][/FONT][/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]mean [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]var [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]n[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2];[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]var[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2] n;[/SIZE][/FONT]
[/SIZE][/FONT][B][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080]run[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/B][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2];[/SIZE][/FONT]
[/SIZE][/FONT]
```

This comment is one that always bemuses me. It reflects the practical reality that all or nearly all actual data sets are going to be discrete - even though we commonly treat them as interval if they meet certain criteria. Only in theoretical distributions will the data be continuous. Which suggests it is impossible to match real world data bases to theoretical distributions as is done in QQ plots. The most you can expect is the actual data will come close to the theoretical ones.

Note, in particular, that the PDF for X is always discrete, because it is based on a finite number of measurements

A Poissonness plot is apparently another good alternative for Poisson data (an alternative to qq plot). I have not found a link to it, I am waiting for a book that covers this by Michael Friendly.

Last edited:

http://www.qualitydigest.com/inside/quality-insider-article/what-chunky-data.html has a good explanation of what chunky data are, and how it impacts one statistical tool used in industrial statistics.

You deal with some interesting stuff miner. Makes me realize how limited my Six Sigma training really was

In practice many academics, if not mathematicians, bend the rules as well. They generate means for ordinal data, they use linear regression when the data has at least 12 distinct levels [which is called "interval like"] and so on. And they disagree with each other massively - including in journals which make fundamentally contrary comments on these types of issues. This is part of the reason I am learning simulation methods in the first place.

As you say it helps having a closed system in which you can test your methods.

The author of the link above, who is a statisician working at SAS, says that poisson data with a mean of 7 has a normal distribution. With enough levels [12 maybe?] I would think that it should be effectively possible to analyse it with linear regression or ANOVA say. I am not sure how this works out with the variance - a Poisson distribution has a specific variance equal to its mean.

The author of the link above, who is a statisician working at SAS, says that poisson data with a mean of 7 has a normal distribution. With enough levels [12 maybe?] I would think that it should be effectively possible to analyse it with linear regression or ANOVA say. I am not sure how this works out with the variance - a Poisson distribution has a specific variance equal to its mean.

If you want to test a certain data fit a theoretical discrete distribution, why not try the Chi Square goodness-of-fit test?

The commentary is so negative, based on my experience, that I generally avoid such tests. Note that I don't know if the tests I have seen are the chi square test you mention, but chi square has well known problems with power based on my SEM classes

All I can say dason is that these test are regularly panned by commentators as being deceptive and graphical alternatives stressed instead because of this issue. If you are saying this criticism is invalid that is interesting to know.

To me it is somewhat concerning that if I have the same exact sample distribution I will sometimes reject it and sometimes not exclusively because of sample size. And yes I know this is an issue with other methods, but from what I have read it is more of an issue with these tests than other ones] .

An interesting take on this subject.

http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless

To me it is somewhat concerning that if I have the same exact sample distribution I will sometimes reject it and sometimes not exclusively because of sample size. And yes I know this is an issue with other methods, but from what I have read it is more of an issue with these tests than other ones] .

An interesting take on this subject.

http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless

Last edited: