# How to statistically compare the trend similarity of two distinct time series?

#### Cláudio Silva

##### New Member
I have two NDVI (Normalized Difference Vegetation Index) time series from the same area but collected with different sensors. Since the sensors have different sensibilities to NDVI and data collection intervals I need to compare if both time series have similar trends. I tried the t-test and Granger causality test but the results are inconclusive.

#### noetsi

##### No cake for spunky
Granger test if one time series correlates with another (causes is the word used but it is not causation in the classical way that is defined). All the time series methods I know look at the impact on one series on another. I am not sure what a similar trend even means in a statistical sense. I have never seen a test if one trend is the same as another - what would you test?

Can you even use a t test with a non-stationary object?

#### Cláudio Silva

##### New Member
The trend is similar but the values are different due to the different sensibilities of the sensors. Basically what I want to test is the increase and decrease of two NDVI time series through time. When A increases in a time interval will B increase? and when A decrease B decrease?

We would assume a stationary object since we are looking over a specific area in time.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Can you visualize it and share that. What about the cross-correlation or whatever that one approach that says one is a lag of the other. Also given the shape, perhaps basic OLS can provide some info.

Lastly, sometimes things just are "inconclusive", right! Gotta watch out for confirmation bias.

#### noetsi

##### No cake for spunky
You can't have a trend if you are stationary

If you want to calculate if they are correlated and they trend in time there are no simple ways to do this, all such methods are complex including the issue of cointegration. If you know arima I would prewhiten each series and then run correlation analysis on the two. Autoarima in R's forecast module is a quick way to pre-whiten them. You have to do it once for each series. Or you can try autoregressive distributed lag model (ARDL) although if you don't know those and don't know R or SAS (or possibly Strata which I am not familiar with) this will not be easy. The other approaches like vector error correction models are scary hard.

#### Cláudio Silva

##### New Member
Can you visualize it and share that. What about the cross-correlation or whatever that one approach that says one is a lag of the other. Also given the shape, perhaps basic OLS can provide some info.

Lastly, sometimes things just are "inconclusive", right! Gotta watch out for confirmation bias.
Thanks for the answer. There's an example plot below.

Both options you mention seemed plausible and I will do my research and possibility try them. However, I'm not interested in observing if one approach is a lag of the other but instead if one time-series (blue line) follows the same time-series trend of the other (red line) in the present time. This could imply a lag but the lag should be attributed to both time series.

When we look at two-time series we would say "they seemed to have the same pattern!". How can we statistically quantify this pattern, excluding the similarity between values but pattern/ trend instead?

Hypothetically the desired output would be a p-value which would tell me if one time-series trend (blue line) let's say in (500 days = 500 record points) are significantly different from other time-series (red line) or not.

I'm sorry for the eventual inconsistencies in my writing, but as a biologist, I learn statistics as I need it

#### Attachments

• 15.6 KB Views: 5

#### Cláudio Silva

##### New Member
You can't have a trend if you are stationary

If you want to calculate if they are correlated and they trend in time there are no simple ways to do this, all such methods are complex including the issue of cointegration. If you know arima I would prewhiten each series and then run correlation analysis on the two. Autoarima in R's forecast module is a quick way to pre-whiten them. You have to do it once for each series. Or you can try autoregressive distributed lag model (ARDL) although if you don't know those and don't know R or SAS (or possibly Strata which I am not familiar with) this will not be easy. The other approaches like vector error correction models are scary hard.

If I'm stationary, instead of a trend can I have a pattern?

I don't know ARIMA but it has been one of the most spoken statistical methods to analyse the trends (so far in my research) and I will consider as an option as well the autoregressive distributed lag model. I have been using R for a year so I think I will be able to apply those methods.

#### noetsi

##### No cake for spunky
There is, as best I can tell, a really good ARDL tool in R built recently by a doctoral student from Greece. I will try to find it. It just does ARDL which as some really nice features in building models (for example the variables do not have to be integrated of the same order which they often must in time series).

If you are stationary in ARIMA you can analyze the correlation between two series. I am not sure what you mean by a pattern. If you do Arima you should know that some think the classical ways of detecting p,d,q or seasonal Arima don't work well with mixed models. And high order p or q often reduce to a lower level model. Generally you should be cautious in social sciences if you get a parameter beyond 2 for Arima. That is why I suggest auto.arima.

#### noetsi

##### No cake for spunky
Btw if anyone is interested, funny for me to say this, the forecast package in R is very useful. I use it to do all my ARIMA. SAS is pretty confusing to me the way it handles this. I do not do comparisons between two time series (correlation). I have long wanted to, but the more I read about this issue the more uncertain I grow in doing it.

#### Cláudio Silva

##### New Member
There is, as best I can tell, a really good ARDL tool in R built recently by a doctoral student from Greece. I will try to find it. It just does ARDL which as some really nice features in building models (for example the variables do not have to be integrated of the same order which they often must in time series).

If you are stationary in ARIMA you can analyze the correlation between two series. I am not sure what you mean by a pattern. If you do Arima you should know that some think the classical ways of detecting p,d,q or seasonal Arima don't work well with mixed models. And high order p or q often reduce to a lower level model. Generally you should be cautious in social sciences if you get a parameter beyond 2 for Arima. That is why I suggest auto.arima.
Thank you so much for your help. It's been quite difficult to find solutions for this question.

If you could tell me what is the package of ARDL tool built-in R recently I will be very grateful.

I might have some doubts in further developments which I'll share here.

Really appreciate your time and attention in helping me

#### hlsmith

##### Less is more. Stay pure. Stay poor.
They are 'distinct' series, but do you suspect they are caused by the same latent variable?

#### noetsi

##### No cake for spunky
Nothing I do deals with latent variables although I know hlsmith was not talking to me.

It will take some time to find the ardl package. I will try to post it tomorrow. You want to look at the documentation for the forecast package in R. I do not try to compare different time series (I spent years working on this, then decided it was too complex and involved too many issues to do). I only use ARIMA to predict univariate time series.

#### Cláudio Silva

##### New Member
They are 'distinct' series, but do you suspect they are caused by the same latent variable?
I'm not sure if I understood the question. The distinct time series data come from 2 different sensors which measures the same thing (NDVI). However, the measures are different due to the different sensor sensitivities (both data are legit). A common latent variable does not exist in this case.

#### Cláudio Silva

##### New Member
Nothing I do deals with latent variables although I know hlsmith was not talking to me.

It will take some time to find the ardl package. I will try to post it tomorrow. You want to look at the documentation for the forecast package in R. I do not try to compare different time series (I spent years working on this, then decided it was too complex and involved too many issues to do). I only use ARIMA to predict univariate time series.
Well, the comparison of univariate time series is what I have in my hands right now. I was surprised about the lack of information that is not available on this topic. I guess it follows your thoughts on complexity issues.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I think time series can fall into dynamic systems that can become chaotic. So even though we think the financial system figured them out - it is still fairly unestablished. Recurrent neural networks are now regularly used, but I am sure they haven't figure out the problem either.

I guess it didn't click that these series are for the same thing. So what will you do with results if you ever get them? Is one sensor considered the gold-standard or costs more? Can't you just look at the correlation between them or fit splines to both and compare the degrees of freedom? Or use piecewise/segment regression? Or smooth them and add confidence intervals and look at overlaps.

#### Cláudio Silva

##### New Member
I think time series can fall into dynamic systems that can become chaotic. So even though we think the financial system figured them out - it is still fairly unestablished. Recurrent neural networks are now regularly used, but I am sure they haven't figure out the problem either.

I guess it didn't click that these series are for the same thing. So what will you do with results if you ever get them? Is one sensor considered the gold-standard or costs more? Can't you just look at the correlation between them or fit splines to both and compare the degrees of freedom? Or use piecewise/segment regression? Or smooth them and add confidence intervals and look at overlaps.
Well, I will return to what brought me here to clarify. Again, I have the same data (NDVI) from different sensors (1 and 2). One of the sensors (1) is way more precise than the other(2) and for that reason, I want to validate the trend of the sensor (2) with (1). Remember that is not the numeric values that interest but rather the trend because despite (2) being less precise the values are not wrong because different sensors have different sensitivities to NDVI.

When plotted (2) is super "noisy" and so I applied seventeen different interpolations and two smoothing filters (moving average and savitzky-golay) with different spans. In total, I got a bunch time-series. Now I want to know which trend from (2) is more identical to the trend of (1).

Answering your question: If I get relevant and reliable results I will be able to apply "the method" to validate the trend of other time series. Note that the validation will always need to have precise data collected in the field (1).
I understand the spline fitting which could be helpful but how degrees of freedom comparison would help me?
The piecewise/segment regression I have to look at the definition to remmenber what it is and if it suitable.

Thank you

#### noetsi

##### No cake for spunky
Well, the comparison of univariate time series is what I have in my hands right now. I was surprised about the lack of information that is not available on this topic. I guess it follows your thoughts on complexity issues.
There is tons of stuff on both. It just takes a lot of time learning where to find it. The problem with comparing two time series is very complex and there are many disagreements on it. Also univariate models do a better job of predicting the future than multivariate models. Different disciplines have different approaches to this. Economics tends to cointegration and VAR or VECM models. In some social science worlds the use ARDL or regression with ARMA error among others. SEM has its own approach to time series and so does multilevel models. Any time you have data with trends in it regression is very tricky.

I have a tome I can send you of years of reading this. It is not well organized and geared to SAS where I did most of my work until recently. Also I am not a statistician like hlsmith so you have to be cautious of whatever is in there it could be wrong.

#### noetsi

##### No cake for spunky
Segmented regression applies different regression to slices in the data. It is one way to approach time series, essentially you analyze different portions of the time series with different regressions. This won't among other things deal with autocorrelation I suspect.

One thing that may or may not apply to your data are structural breaks. Where the pattern changes dramatically. There are ways to test for this if you know where to test for the break.

#### Cláudio Silva

##### New Member
There is tons of stuff on both. It just takes a lot of time learning where to find it. The problem with comparing two time series is very complex and there are many disagreements on it. Also univariate models do a better job of predicting the future than multivariate models. Different disciplines have different approaches to this. Economics tends to cointegration and VAR or VECM models. In some social science worlds the use ARDL or regression with ARMA error among others. SEM has its own approach to time series and so does multilevel models. Any time you have data with trends in it regression is very tricky.

I have a tome I can send you of years of reading this. It is not well organized and geared to SAS where I did most of my work until recently. Also I am not a statistician like hlsmith so you have to be cautious of whatever is in there it could be wrong.
As a biologist, I thought that the trend comparison between two distinct time series would be fairly simple. My first approach would be the application of a "statistical method" which would look at two data points (A and B; B and C; C and D...) at the correspondent time (0 and 1; 1 and 2; 2 and 3...) successively. Those data points (e.g. A and B; B and C...) would be subtracted and then generate a value hypothetically named as "increase rate" (+) or "decrease rate" (-). Those values will be compared as a whole between the two time series and from there generate a p-value which would tell us if the time series trends are significantly different or not. Maybe I'm being too naive due to the lack of statistical knowledge but apparently is not that simple as you mentioned.

Do you know what are the most used methods on time series for bioogist? For instance, I have NDVI which basically measures "how green" is a tree canopy, shrub, or grass.

#### Cláudio Silva

##### New Member
Segmented regression applies different regression to slices in the data. It is one way to approach time series, essentially you analyze different portions of the time series with different regressions. This won't among other things deal with autocorrelation I suspect.

One thing that may or may not apply to your data are structural breaks. Where the pattern changes dramatically. There are ways to test for this if you know where to test for the break.
The problem with slicing data is on the empirical methods implied which you don't control. Since I'm trying to look at a possible standard approach that will deal with the variations between data points, I would need something more intuitive. Nonetheless is a method that I might consider.