I think this might be what you're looking for. In particular, the salamanders and crab examples seem to have similarity with your problem, and they also use SAS JMP (SAS's GUI app) for analysis, so no code to read.
It would be a big help if someone could give a practical example (by hand) how Poisson regression is used to calculate a time trend line, and calculate a confidence interval for whether there is a trend. Here is some sample data if you would like (Texas viral hepatitis deaths):
Year, Events, Population, Rate per 100000
.........
1990, 108, 16986510, 0.64
1991, 154, 17349000, 0.89
1992, 141, 17655650, 0.80
1993, 212, 18031484, 1.18
1994, 254, 18378185, 1.38
1995, 283, 18723991, 1.51
1996, 353, 19128261, 1.85
1997, 383, 19439337, 1.97
1998, 432, 19759614, 2.19
..........
I understand the algorithm for least squares slope, and how to analyze that slope for significance. But I want a trend method to take into account the variance of the data points. Obviously, if each data point is based on hundreds of events, the slope is more reliable than if each data point is based on just a few events. But least squares totally ignores the variance of the data points. I have also seen chi-square suggested, know how to do chi-square, plan to give chi-square a try, but I saw a lot more references to Poisson regression for time trend analysis.
I am familiar with the Poisson distribution itself, so that's not the problem. I just can't find a practical example of Poisson regression anywhere, so I don't "get it". A bunch of math equations or SAS code won't help. The only way I can understand an algorithm is to do it by hand. Any progress toward outlining the steps would really help.
Thanks, Daniel
I think this might be what you're looking for. In particular, the salamanders and crab examples seem to have similarity with your problem, and they also use SAS JMP (SAS's GUI app) for analysis, so no code to read.
Thanks for trying to help and looking around. I saw that before, with the salamanders and crabs. I reread it, but unfortunately it doesn't help. It discusses the Poisson distribution (which I understand), but then mostly jumps to pictures of a GUI to do the Poisson regression. I need to understand the underlying algorithm.
In case helpful, here is a different test data set (Viral hepatitis deaths, Bexar County) perhaps more suited to explaining Poisson regression for trend analysis. This data set has a lower number of events:
Year, Events, Population, Rate per 100000
-----------------------
1990, 7, 1185394, 0.59
1991, 9, 1213028, 0.74
1992, 8, 1233289, 0.65
1993, 18, 1260287, 1.43
1994, 14, 1279620, 1.09
1995, 19, 1299486, 1.46
1996, 36, 1318431, 2.73
1997, 49, 1334722, 3.67
1998, 30, 1351304, 2.22
-----------------------
Is there anyone who might explain how Poisson regression calculates a time trend line, and calculates a confidence interval for whether there is a trend? I'm looking for the steps that would be done by hand. It would be be a big help.
Thanks,
Daniel
I may not be understanding your question correctly.
Poisson regression is a special case of Generalized Linear Models (GLM). Software typically uses an algoritm called iteratively reweighted least squares (IRLS), or possibly another simialr algorithm, to find parameter estimates, including, in your case, the time trend. Inference comes from large-sample normal approximations
Also, I guess if you want to do it by hand, or perhaps with simple formulas in EXCEL, you could do a simple linear regression of rate vs. year to get the time trend, and then use the large-sample point-wise confidence interval as an approximation -- I believe the approximation here err's on the side of being conservative (larger CI), the large-sample CI formula I refer to is
which is explained e.g. p.11, and in your case is just the yearly rate.
I'm not sure if this is what you mean by "by hand", and this truly is just a rough approximate approach, but it's the closest thing to "by hand" that comes to mind. Doing IRLS by hand isn't usually feasible except perhaps very trivial small-sample examples
Thanks again for responding, and trying to help.
I thought "by hand" was pretty clear. To clarify, it means doing something with pencil, paper, and a book of reference values. For example, to get a median for 15 values, I could enter them into a program, and push a button. Doing this by hand, I would sort the 15 values from low to high, and select value #8. Or for chi-squared test, I could enter the values into a program, push a button, and get a bunch of output. By hand (using a simple example as you suggest), I would go through the algorithm for doing chi-square (I won't spell it out here), determine degrees of freedom, look up the chi-square value in a table, and get the p value.
It seems very unlikely the formula you suggest for the confidence interval would help. If nothing else, it does not take into account the variances of the individual data points.
"By hand" has nothing to do with whether an approximation is rough or precise. It just means carrying out the algorithm by hand, vs pushing a button.
Anyway, thanks again. Still hoping someone can show how Poisson regression calculates a time trend line, and calculates a confidence interval for whether there is a significant trend, using a simple example. It seems a pretty reasonable request to understand the underlying logic.
Daniel
actually, the formula does account for the variance of the individual points; var=x-bar / n, as well-explained in the link I sent you, not to mention multiple obviously-relevant wikipedia pages
I never implied by hand implies an approximation; I said the method I mentioned is an "approximate approach", meaning it's probably not the best choice, just the best choice if you don't want to push one of those awful buttons
If you want to do the algorithm yourself, I suggest you start by reading up on IRLS
The reason I don't believe the formula you posted accounts for the variance of the individual points is that each point has it's own separate variance, and the equation obviously does not have a bunch of separate variances. I would suggest the formula is just the confidence interval for the overall mean. That's not what I need.
Anyway, I don't think it's going to help anyone for us to argue about the equation you posted. We can agree to disagree. It is off topic from my original question. Poisson regression is a standard, recommended method for doing trend analysis. I'm simply asking for someone to show the mechanics of a simple example by hand.
I understand the power and usefulness of pushing a button to carry out a statistical analysis. I never casted any aspersions on any software or person. It hardly seems fair to criticize me for wanting to understand the underlying math for a statistical method. And I never used the word "awful button", those are your words.
Respectfully, I don't think IRLS has much to do with my question. Again, we can agree to disagree, because your suggestion, while intriguing, is off topic from my post.
If anyone can answer my original specific question, I'd be grateful.
Thanks,
Daniel
Last edited by daniel_goldman; 02-18-2014 at 10:07 PM.
The best example I can think of is that crab example, but it is better explained in Agresti's book Intro to CDA, also some SAS code here.It would be a big help if someone could give a practical example (by hand) how Poisson regression is used to calculate a time trend line, and calculate a confidence interval for whether there is a trend
Maybe you're thinking it's done similar to a Chi-squared test in a 2x2 table? In that case, yeah, we can enter some number in a numerator, denominator, compare to the table of Chi-sq values ... but the "regression" part of "Poisson regression" means it's not so simple; specifically, there's an optimization of a log-likelihood involved, which requires an iterative numerical solution.
Do you ask this question for general learning, i.e. to better understand Poisson regression, or is there a specific problem with specific data involved, perhaps the hep data you posted? I ask only because the answer sort of depends on which it is.
Honestly, given what you've said, my answer is a pretty good one; it's at least a reasonable 1st approach in a real-world analysis ... that's partly why I ask your objective, because even though it's a good analytic "first step" it might not be that instructive in learning about Poisson regression. By the way, as I mentioned, "Poisson regression" is a special case of "Generalized Linear Models", and it might be helpful to check out the wikipedia page about that.
As for the confidence interval formula I posted: I was suggesting you replace the "x-bar" with the "rate" of each year ("n" is 1 for each year) .... and yes, this is a crude approach, mostly b/c it doesn't pool the info from all available data points, but it will give a "large sample" i.e. approximate idea of the confidence interval at each year, and it does account for the different variance at each year (Poisson mean = Poisson variance, estimate of mean and var for each year is the data point) ... again, this is a crude approach I know, but it's the only thing close to a Poisson regression that can be done by hand
Thanks again for your suggestions.
I think my objective was pretty clearly stated. I need to know how to do the underlying math for Poisson regression for doing a time trend. I don't see how the answer depends on "why", as you suggest. The underlying method is just the underlying method, independent of anything else.
I never said or implied Poisson regression was done similar to a chi-squared test, as you suggest.
The only reason I posted the hepatis data was to make it easier for someone to help, to provide a simple test data set, and one with real data. There is nothing special about those data.
It is certainly not going to help me to refer me to a wikipedia page. In any case, I am asking for something specific.
Without evidence that there is something special about Poisson regression that cannot be done by hand, I totally disagree that "it's the only thing close to a Poisson regression that can be done by hand". My guess is that, like almost every other statistical procedure I know, Poisson regression is actually quite simple, once one understands it. I agree it is related to GLM. I admit I don't "get it". That is why I am asking. If someone can get a specific answer to show the steps in a simple example, that would be much appreciated.
Daniel
Poisson regression isn't just related to generalized linear models - it IS a generalized linear model. Note that there is a different between "general linear models" and "generalized linear models". For general linear models there really is a closed form solution for the parameter estimates. For generalized linear models in any non-trivial case (anything other than intercept only models) there is no closed form solution for the parameter estimates.
Although the theory is relatively simple to grasp once you know what's going on - I don't know anybody that has actually walked through and done it "by hand". I've coded it up but I've never gone through with paper and pencil and obtained the estimates for a poisson regression (or any other type of non-trivial generalized linear model) "by hand".
Are you familiar with maximum likelihood estimation?
I don't have emotions and sometimes that makes me very sad.
I've set you in the right direction; any statistician can see that; I know (from your comments) you're not a statistician, and that's okay. It's fun to help others, regardless of whether they know more or less than me about something. All I can say at this point is good luck. Until I see evidence that you've done some actual research into the topic on your own, I don't think I can be any more help. -Regards
Tweet |