Whats the best way to analyse the correlation between these two variables?

Hello All,
I have to admit my statistics skills are not amazing however, i'm very interested in improving my knowledge. I've been trying to work out a way to see if there is a correlation between the number of troops in iraq and Afghanistan (BOG) and the price of an index that only deals in companies with significant numbers of military contracts. It's something i've been interested in for a while and i've been wondering if anything might come of it (most likely not). However, i want to understand how i would go about analysing any relationship between the two. Especially since the data is only monthly.

I've done some reading regarding war risk and how one could analyse it and i've see a few heteroskedastic methods used including GARCH models in the following papers:
Heteroskedasticity in Stock Return Data: Volume versus GARCH Effects
What Do Financial Markets Think of War in Iraq?.
The Effect of War Risk On U.S. Financial Markets

However these are all concerned in general with trying to analyse news events and seeing if they have any affect on stock prices.

Basically i was wondering if anyone could provide me with a few insights into how i should go about analysing any relationship between the two. I've already done some cross correlation between the two but i'm not sure what to do with it at the moment.

Sorry if this seems really vague i was just hoping for some help / points so i can start to explore stats with a topic i'm interested in.

Thanks in advance and i look forward to hearing from you guys. Also i have some data if anyone wants to have a look at what i can play with.


No cake for spunky
To start with you need to clarify if you are analyzing it as a time series or as a cross sectional variable. To me the former makes the most sense (I would think that the price index is a lagging indicator) and you seem to see it this way as well (I have not worked with GARCH so I don't know its assumptions).

If you are doing it as a time series you need to pick one of those methods such as ARIMA and proceed from there (which is easier to say than to do I know). I would point out that Duke University has a large website that deals with ARIMA if you go that route and you might try this for time series if you work in R.


Substantively I think you would find that past responses to the war abroad on these companies is likely changing dramatically given recent changes in budget policy (that is the sequestration and the decline in the Republican party for national security spending). That will likely mean future (or recent past) results are very different than they were say three years ago.

So you will need an indicator to show when this transformation occured or estimate two different models.
Thank you for the insightful comment.
I will have a look into it now, and try and find some worked examples so i can see the process. Hopefully it will give me something to work from as i'm a little confused by the wikiapedia page on it.

I think i've found a few sources to look at to try and see if i can understand the ARIMA model properly and maybe do it in MINITAB as i have a working understanding of that program. My R skills are very minimal.

My idea was that any change in military budget would be demonstrated in the troop levels which is why i chose it.

If was i to post some figures would you mind helping me with the ways in which i can work out what the lag should be as i've been using a rough estimate of 25% variables as my lag just to see what it looks like.

Currently reading this:
http://delbecque.free.fr/chap07 - ARIMA.pdf

However, for now i will investigate the ARIMA model further.
Last edited:


No cake for spunky
I am not the right person to help with estimating a lag as I am relatively new to time series. The ARIMA I have done are for trends say of budget data. Those are (I guess) a single variable although ARIMA that I have seen does not talk about variables, it talks about trends in a time series.
It's been a little while since I've thought of these things so I'm a bit fuzzy on it, but I'll give it a shot.

Really sorry if this sounds stupid but i can only use ARIMA on one variable? How do i go about seeing the affect of one on another?
In my experience, ARIMA models are generally limited to the univariate case unless you group them together into a single vector as suggested here:

Looking at the previous links, such as the Duke one and the one I just pasted, it appears that most emphasis of optimal model selection is placed on the results of the auto and partial autocorrelation functions (pcf and acf's). This is certainly one way to go about it. There is also something popular called Box-Jenkins model selection for ARIMA models so I would take a look at that as well. They're just different methods of optimal lag selection and, depending on your objective, you can decide how deep you want to dive into the selection process.

Do note, however, that you need to be looking at stationary series with this (I'm pretty sure). Hence the common ARIMA(p,d,q) specification where p is the autoregressive lags, d is the difference amount (usually with nonstationary time series this is 1 but be sure to check the data), and q is the moving average component choice.

Addressing the second part of your question, the ARIMA models do not inherently check the effects of one variable on another. To look at the multivariate realm (and thus the cross correlation) I would move into the VAR models (vector autoregression). They are basically just an extension of the ARMA into the multivariate realm, and each time series can affect the others run in the same model. This may seem like a complex model, but the great news is that it can be estimated by equation by equation OLS!

Associated with VAR results are Impulse Response Functions (IRF's) which show how a one unit deviation in a time series affects another. There is an analog to this called variance decomposition. You should look into both of them.

As with the univariate ARMA, you will have to check to see if your time series are stationary. Not only that, but now you'll have to check to make sure they're not cointegrated! The differencing technique used in the ARMA realm is generally attempted to be avoided in the VAR realm as you may lose information when you difference the series. You'd want to try to estimate a cointegrating vector, if possible, to avoid losing the info. There is quite a bit of literature on this out there so feel free to read up on it. It may seem overwhelming, but it is all pretty relevant and important. Here's a link with some basic info (just skimmed it after a quick google search so don't quite know the quality):

Hamilton wrote a Times Series Econometrics book which is used almost religiously in econometrics classrooms. If you want to get down in the details, I'd highly recommend that book.

This was a bit long winded, but I hope it helped. I haven't done seriously time series analysis in some time so there might be shady regions of info that should be checked when you're reading through some of the literature.

Good luck!


No cake for spunky
I always thought Box Jenkins and ARIMA were simply different names for the same thing (box and jenkins created ARIMA that is which was once called Box Jenkins). Certainly ACF and PACF are central to analysis of ARIMA (although there are some advanced methods for instance one to suggest mixed AR/MA which is very difficult in the context of ACF and PACF analysis). It is absolutely true that ARIMA requires stationary data (detrended data) to work. That is why you specify d terms in it.

Could you give the cite for Hamilton (that is I don't know his name and would like to get the book you mentioned).
Box Jenkins and ARIMA are probably just different names for the same thing. Time may have made my memory a bit fuzzy. However, when I used to do ARIMA modeling I generally referred to Box Jenkins as the methodology for selecting the appropriate lag lengths for the AR and MA components. For instance, I would generate a stationary time series by taking the log and first difference of the series, if necessary. Then I would do my 'Box Jenkins' model selection by running through each iteration of AR and MA parameters, finding the one that minimized AIC and SIC. I would then make sure these two 'optimal' models displayed the correct residual pattern and then decide between the two. That was generally how I went about deciding the 'best' mixed model parameters for my univariate forecasts. I probably did a poor job of explaining that earlier, and that methodology may not be optimal -- but it's along the lines of what I would do in my forecasting class before we moved into VAR models.

Here's a link to the time series book I mentioned by James Hamilton, Time Series Analysis:
It is a great econometric reference for time series and is quite math intensive (which allows everything to be proved nicely). Though, it is a bit pricey so unless you have some extra cash lying around you may want to look for an alternative or see if it exists in a local library (you could also search for the pdf via google).


No cake for spunky
thanks for the link and the comments. ARIMA is something I painfully "learned" and then forgot most of. Now I have to learn it again. :(