# Testing for trend

#### Markica85

##### New Member
I have following data that represent number of new patients with rare disease for subsequent 5 year time periods: 4, 3, 4, 2, 4, 2, 4, 6, 11

As I can see, there is an increasing trend. Is there more appropriate way to test for significance of trend in this situation than Spearman correlation?

I also have data on patient basis (year of diagnosis for each patient).

Last edited:

#### Markica85

##### New Member
Is it legitimate to analyze last 4 time intervals only using Spearman rank correlation and to say there is an increasing number of patients in last 20 years?

Last edited:

#### Markica85

##### New Member
I performed Poisson regression using summarized data (since raw data consist of one variable only - year of diagnosis). Dependent variable would be number of observations per interval, and independent would be time interval that was coded with number from 1 to 9.

I get conflicting results if I use time intervals as categorical variable with 9 levels (non significant overall result and non-significant for each time interval) and when I just put variable time interval into analysis (overall significant result and for variable time interval). Can you help interpret this?

Alternatively, I designated each year as a time interval and made same analysis, the result is also significant.
Spearman rank correlation provides insignificant result for whole dataset. Spearman rank correlation is significant for last 4 5-year intervals as there is clear increase in number of diagnoses here.
Which approach is correct, is there evidence of increasing trend or not?

Last edited:

#### Markica85

##### New Member
Here is visual representation of my data in 5 and 10 year time intervals

#### rogojel

##### TS Contributor
hi,
given this data, I think you could build on Greta's idea, to test whether the values 6 and 11 are still within the expected range of variation, using the Poisson distribution. I found that the value 6, given the previous values, is possible (p value of 0.065) but the value 11 is pretty much impossible assuming there was no change in the process (p value of 0.0009 even including the 6 as a legit value)

Whether this is a change in the trend or a one-off event is impossible to tell without future observations.

regards

#### Markica85

##### New Member
Thank You, but what is bothering me is: can time be added as a numerical variable and can P value be interpreted as a significant trend?

#### GretaGarbo

##### Human
Thank You, but what is bothering me is: can time be added as a numerical variable and can P value be interpreted as a significant trend?
Yes, time can be included as an explanatory variable (and it will be statistically significant as shown below).

But that model is suggested by the data. It is not a hypothesis that is suggested to you from other considerations. So I don't think that most people would bet that the next value would continue to be higher. Something specific could have happened during this specific time period. (Maybe the pattern would have been more clear with yearly data.)

R code below:

Code:
> #---
> y2 <-  c(4, 3, 4, 2, 4, 2, 4, 6, 11 )
> time2 <- 1:9
> plot(time2, log(y2))
> time.factor <- as.factor(c(rep(1, 7), 2, 2))
> time.factor
[1] 1 1 1 1 1 1 1 2 2
Levels: 1 2
> ma <- glm(y2 ~ time2, family = poisson(link = log))
> summary(ma)

Call:
glm(formula = y2 ~ time2, family = poisson(link = log))

Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.44612  -0.68419  -0.07756   0.47372   1.25204

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.71202    0.40965   1.738   0.0822 .
time2        0.14256    0.06378   2.235   0.0254 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 11.4207  on 8  degrees of freedom
Residual deviance:  6.2175  on 7  degrees of freedom
AIC: 39.408

Number of Fisher Scoring iterations: 4

> mb <- glm(y2 ~ time.factor, family = poisson(link = log))
> summary(mb)

Call:
glm(formula = y2 ~ time.factor, family = poisson(link = log))

Deviance Residuals:
Min       1Q   Median       3Q      Max
-0.9057  -0.7653   0.3809   0.3809   0.8199

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)    1.1896     0.2085   5.705 1.16e-08 ***
time.factor2   0.9505     0.3198   2.972  0.00296 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 11.421  on 8  degrees of freedom
Residual deviance:  3.270  on 7  degrees of freedom
AIC: 36.461

Number of Fisher Scoring iterations: 4

#### Markica85

##### New Member
Thank You, You've been very helpful. These are data describing cases of rare child tumor that can be sporadic or radiation-exposed and unfortunately, hypothesis can be generated by other considerations. Next year not included into the bar chart (not 5-year period) already had 6 new cases.

#### GretaGarbo

##### Human
These are data describing cases of rare child tumor that can be sporadic or radiation-exposed
This statement (the rare event) makes it more plausible to use a Poisson model.

.... and unfortunately, hypothesis can be generated by other considerations.
If your hypothesis is suggested from the data, then that is a bad thing when considering the inference, conclusion, you can make.
(Imagine to throw 20 dices and notice that two of them had 6 and to claim that there is something special with these two. Of course that is nonsense and a hypothesis that is suggested by the data.)

But if you have other reasons to believe that children now days are exposed to more radiation then you have a "genuine" hypothesis, something that statistics is built upon.

Next year not included into the bar chart (not 5-year period) already had 6 new cases.
Enter your data like below of yearly data (all single year data from 1970). Then we will run it for you.

Code:
y2 <-  c(4, 3, 4, 2, 4, 2, 4, 6, 11 )

#### noetsi

##### Fortran must die
In time series regression you commonly test time variables where each period is the x, so the first period has an x of 1, the next period x is 2 and so on. If the time variable is significant then there is a linear trend (so time matters).

I am not sure that applies here.

#### Miner

##### TS Contributor
Another approach that would not be common to your field, but is very common in industrial statistics is to use a control chart to detect abnormal deviations from background noise that is inherent to a process. Using this approach, your first 8 subgroups of data are the inherent variation while the final subgroup denotes an unusual increase.

There are three types of control charts you may use.
1. U chart for Poisson data from varying sample sizes
2. C chart for Poisson data from constant sample sizes
3. Individuals chart when Poisson data meet the assumptions for the Normal approximation
Since you did not provide the sample sizes, I used the second and third charts to illustrate.

Another possibility is a G chart for the time between rare events.

#### noetsi

##### Fortran must die
Interestingly you can't, as far as I know, use common time series approaches such as ARIMA or exponential smoothing, to determine if there is a statistically significant pattern - even in past data. You can forecast future results, but these approaches won't tell you if the pattern that existed in the past is statistically significant it assumes that they are real population results effectively.

Also 50 points are usually required for time series.