# time series - ARIMA + ARIMAX + R

#### Amaterasu

##### New Member
Hello,
I am not a statistician, a machine learning computer science person. Recently, have read quite a bit about ARIMA and ARIMAX and using them in R. I will be grateful if you check my workflow and point out any mistakes. Thanks in advance!
1. Pre-processing
the two time series (A, B) were not stationary => applied log and diff twice to make it stationary
MydataLog<-data.frame(log(Mydata$A), Mydata$B) // didn’t apply log(B)
MydataDiff1<-data.frame(diff(MydataLog$A), diff(MydataLog$B))
MydataDiff2<-data.frame(diff(MydataDiff1$A), diff(MydataDiff1$B))
- Test for stationary
adf.test(MydataDiff1$A, alternative="stationary") adf.test(MydataDiff1$B, alternative="stationary")
(p-value = 0.01 =>stationary (smaller than 0.05) after log+diff+diff )
data: MydataDiff2$A Dickey-Fuller = -6.7353, Lag order = 3, p-value = 0.01 alternative hypothesis: stationary 2. Cross-correlation function #try any model to get any white noise ARIMA(1,0,2) for B model1<-arima (MydataDiff2$B, order=c(1, 0, 2))
residuals1<-residuals(model1)
Box.test(residuals1, type='Ljung',lag=log(length(residuals1)))
yfiltered <- residuals(Arima(MydataDiff2$A, model=model1)) Box.test(yfiltered, type='Ljung',lag=log(length(yfiltered))) c <- ccf(residuals1, yfiltered) Is it the correct way to perform prewhitening and cross-correlation between A and B? My ccf shows significant correlation at lag 0 => Yt depends only on Xt and not on Xt-1, etc. What’s the 5% significance test? Some say greater than 2/sqrt(N), others say greater than 2/sqrt(N)sqrt(N-k) 3. Arimax – because I saw only one significant correlation at lag 0 = > am I applying dynamic regression of the following type: Y(t) = b0 + bX(t) + n(t) n(t) = a1*n(t-1) +…+ ap*n(t-p) – Q1*e(t-1)…-Qq*e(t-q) + e1 Start with some model Arima(1, 0, 2), X = A, Y = B model<-Arima(MydataDiff2$B, xreg = MydataDiff2$A, order=c(1, 0, 2)) res2<-residuals(model) Box.test(res2, type='Ljung',lag=log(length(res2))) #result was white noise #correct the model: ny<-arima.errors(model) tsdisplay(ny, main = "ARIMA errors") #potential ARIMA(MA(2), AC(3) for example) for ny model1<-Arima(ny, xreg = MydataDiff2$A, order=c(3, 0, 2))
ny1<-arima.errors(model1)

#test et for white noise
Box.test(residuals(model1), lag=10, type = "Ljung")

#### noetsi

##### Fortran must die
I don't know R, but I will make some comments. If you are using ARIMAX than you pre-whiten both series first (create a PDQ ARIMA model) which is what I assume you did. ADF has serious power issues. So you should test for Stationarity not just with it but with one of the test of Stationarity that has that has the opposite null. If both test show there is a trend (non-Stationarity) than your more confident of your results. Logging does not deal with non-Stationarity as far as I know. It deals with variance issues.

I am not sure what you mean by dynamic regression. Different authors use that for different things.