ARIMA models

noetsi

Fortran must die
#1
I have two models. One applies a seasonal difference, one applies a seasonal and non seasonal difference. The latter is stationary according to the ADF test, the former is not. But the former passes the Box Ljung test, meaning there is no serial correlation, and later one (the one that is stationary) does not - its not close. Also the one that is stationary generates numbers that are completely impossible for our organization (they are totally impossible given past spending, absurdly so).

I have been unable to find any model that is stationary according to the ADF that generates realistic numbers or passes the Box Ljung test for no serial correlation.
 

staassis

Active Member
#2
Could you try [Model 3] = [Model 1] + [deterministic polynomial trend]? If this was the true dynamics and [Model 1] was non-stationary, [Model 3] would still be estimated consistently.

Regarding [Model 2]: when there is too much differencing, forecasts of the original process are oftentimes weird. But you probably know this already.

How much data are you having this time? What do AIC & BIC say?
 

noetsi

Fortran must die
#3
I did not know that differencing caused problems for prediction, although that makes sense.

I am not sure what the deterministic trend would be. Just a square? A cubic?
 

staassis

Active Member
#4
Depends on how much data you've got. A cubic trend might work. In general, the order of the polynomial could be determined by AIC or BIC.
 

staassis

Active Member
#6
This is too little for double differencing or any big model. A rule of thumb says: at least 10-15 observations per each parameter to estimate, depending on the noise conditions and the framework. So all those estimates + Box Ljung tests + ADF tests are not terribly accurate.

I guess, you could try single differencing + AR(1) + quadratic deterministic trend.......
 

noetsi

Fortran must die
#7
I found this comment fascinating since I spent a lot of time learning the classical way to identify PDF components. For one thing I never realized these were only theoretical (I knew they broke down with both MA and AR components).

"This hazard is revealed by sampling experiments. When the data come from the real world, the notion that there is an underlying ARMA processis a fiction, and the business of model identification becomes more doubtful. Then there may be no such thing as the correct model; and the choice amongstalternative models must be made partly with a view their intended uses."

https://www.le.ac.uk/users/dsgp1/COURSES/THIRDMET/MYLECTURES/4XIDNTIFY.pdf
 
#8
Like with everything, the model is just a model. Whatever parametric paradigm one may choose, the truth may not belong there.

The issue is known as model bias. Like with any type of bias, we may even want to introduce it intentionally if the resulting estimation procedure has a much smaller variance and the mean-square error (MSE) decreases as the result. Say, we know that the truth is a spline with a very high degree of wiggliness. "Who cares?" - we say to ourselves: "With only 100 data points at hand, a cubic spline is the best we can do." And we are right. A cubic spline will deliver a lower MSE than a spline of order 10.
 
Last edited: