I am not a statistician, a machine learning computer science person. Recently, have read quite a bit about ARIMA and ARIMAX and using them in R. I will be grateful if you check my workflow and point out any mistakes. Thanks in advance!

1. Pre-processing

the two time series (A, B) were not stationary => applied log and diff twice to make it stationary

MydataLog<-data.frame(log(Mydata$A), Mydata$B) // didn’t apply log(B)

MydataDiff1<-data.frame(diff(MydataLog$A), diff(MydataLog$B))

MydataDiff2<-data.frame(diff(MydataDiff1$A), diff(MydataDiff1$B))

- Test for stationary

adf.test(MydataDiff1$A, alternative="stationary")

adf.test(MydataDiff1$B, alternative="stationary")

(p-value = 0.01 =>stationary (smaller than 0.05) after log+diff+diff )

data: MydataDiff2$A

Dickey-Fuller = -6.7353, Lag order = 3, p-value = 0.01

alternative hypothesis: stationary

2. Cross-correlation function

#try any model to get any white noise ARIMA(1,0,2) for B

model1<-arima (MydataDiff2$B, order=c(1, 0, 2))

residuals1<-residuals(model1)

Box.test(residuals1, type='Ljung',lag=log(length(residuals1)))

yfiltered <- residuals(Arima(MydataDiff2$A, model=model1))

Box.test(yfiltered, type='Ljung',lag=log(length(yfiltered)))

c <- ccf(residuals1, yfiltered)

Is it the correct way to perform prewhitening and cross-correlation between A and B?

My ccf shows significant correlation at lag 0 => Yt depends only on Xt and not on Xt-1, etc.

What’s the 5% significance test? Some say greater than 2/sqrt(N), others say greater than 2/sqrt(N)sqrt(N-k)

3. Arimax – because I saw only one significant correlation at lag 0 = > am I applying dynamic regression of the following type:

Y(t) = b0 + bX(t) + n(t)

n(t) = a1*n(t-1) +…+ ap*n(t-p) – Q1*e(t-1)…-Qq*e(t-q) + e1

Start with some model Arima(1, 0, 2), X = A, Y = B

model<-Arima(MydataDiff2$B, xreg = MydataDiff2$A, order=c(1, 0, 2))

res2<-residuals(model)

Box.test(res2, type='Ljung',lag=log(length(res2)))

#result was white noise

#correct the model:

ny<-arima.errors(model)

tsdisplay(ny, main = "ARIMA errors")

#potential ARIMA(MA(2), AC(3) for example) for ny

model1<-Arima(ny, xreg = MydataDiff2$A, order=c(3, 0, 2))

ny1<-arima.errors(model1)

#test et for white noise

Box.test(residuals(model1), lag=10, type = "Ljung")