Try a Poisson model.
I have following data that represent number of new patients with rare disease for subsequent 5 year time periods: 4, 3, 4, 2, 4, 2, 4, 6, 11
As I can see, there is an increasing trend. Is there more appropriate way to test for significance of trend in this situation than Spearman correlation?
I also have data on patient basis (year of diagnosis for each patient).
Last edited by Markica85; 02-25-2017 at 12:28 PM.
Try a Poisson model.
Markica85 (02-25-2017)
Is it legitimate to analyze last 4 time intervals only using Spearman rank correlation and to say there is an increasing number of patients in last 20 years?
Last edited by Markica85; 02-25-2017 at 03:08 PM.
I performed Poisson regression using summarized data (since raw data consist of one variable only - year of diagnosis). Dependent variable would be number of observations per interval, and independent would be time interval that was coded with number from 1 to 9.
I get conflicting results if I use time intervals as categorical variable with 9 levels (non significant overall result and non-significant for each time interval) and when I just put variable time interval into analysis (overall significant result and for variable time interval). Can you help interpret this?
Alternatively, I designated each year as a time interval and made same analysis, the result is also significant.
Spearman rank correlation provides insignificant result for whole dataset. Spearman rank correlation is significant for last 4 5-year intervals as there is clear increase in number of diagnoses here.
Which approach is correct, is there evidence of increasing trend or not?
Last edited by Markica85; 02-25-2017 at 03:18 PM.
Here is visual representation of my data in 5 and 10 year time intervals
hi,
given this data, I think you could build on Greta's idea, to test whether the values 6 and 11 are still within the expected range of variation, using the Poisson distribution. I found that the value 6, given the previous values, is possible (p value of 0.065) but the value 11 is pretty much impossible assuming there was no change in the process (p value of 0.0009 even including the 6 as a legit value)
Whether this is a change in the trend or a one-off event is impossible to tell without future observations.
regards
Markica85 (02-26-2017)
Thank You, but what is bothering me is: can time be added as a numerical variable and can P value be interpreted as a significant trend?
Yes, time can be included as an explanatory variable (and it will be statistically significant as shown below).
But that model is suggested by the data. It is not a hypothesis that is suggested to you from other considerations. So I don't think that most people would bet that the next value would continue to be higher. Something specific could have happened during this specific time period. (Maybe the pattern would have been more clear with yearly data.)
R code below:
Spoiler:
Markica85 (02-28-2017)
Thank You, You've been very helpful. These are data describing cases of rare child tumor that can be sporadic or radiation-exposed and unfortunately, hypothesis can be generated by other considerations. Next year not included into the bar chart (not 5-year period) already had 6 new cases.
This statement (the rare event) makes it more plausible to use a Poisson model.
If your hypothesis is suggested from the data, then that is a bad thing when considering the inference, conclusion, you can make.
(Imagine to throw 20 dices and notice that two of them had 6 and to claim that there is something special with these two. Of course that is nonsense and a hypothesis that is suggested by the data.)
But if you have other reasons to believe that children now days are exposed to more radiation then you have a "genuine" hypothesis, something that statistics is built upon.
Enter your data like below of yearly data (all single year data from 1970). Then we will run it for you.
Code:y2 <- c(4, 3, 4, 2, 4, 2, 4, 6, 11 )
Markica85 (03-01-2017)
In time series regression you commonly test time variables where each period is the x, so the first period has an x of 1, the next period x is 2 and so on. If the time variable is significant then there is a linear trend (so time matters).
I am not sure that applies here.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Markica85 (03-01-2017)
Another approach that would not be common to your field, but is very common in industrial statistics is to use a control chart to detect abnormal deviations from background noise that is inherent to a process. Using this approach, your first 8 subgroups of data are the inherent variation while the final subgroup denotes an unusual increase.
There are three types of control charts you may use.Since you did not provide the sample sizes, I used the second and third charts to illustrate.
- U chart for Poisson data from varying sample sizes
- C chart for Poisson data from constant sample sizes
- Individuals chart when Poisson data meet the assumptions for the Normal approximation
Another possibility is a G chart for the time between rare events.
GretaGarbo (02-28-2017), Markica85 (03-01-2017)
Interestingly you can't, as far as I know, use common time series approaches such as ARIMA or exponential smoothing, to determine if there is a statistically significant pattern - even in past data. You can forecast future results, but these approaches won't tell you if the pattern that existed in the past is statistically significant it assumes that they are real population results effectively.
Also 50 points are usually required for time series.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Markica85 (03-01-2017)
Tweet |