HELP regression with percentage and current $

i am trying to construct a regression with inflation as my dependent variable and gdp and income as my independent, but i am not sure if it can be done as inflation is a percentage where as income and gdp are in terms of current $.

does that factor make a difference in my ols regression, do i just have to write e/g 5.6 percent as 0.0056
First: 5.6% = 0.056
Second, I would convert the inflation to $ keeping in mind that inflation is usually given as a rate e.g. 2.9% per year.
Or use the consumer price index instead, which is used to calculate inflation rates anyway.


TS Contributor
You could also consider the fact that gnp and income are trending variables. Perhaps consider whether you really want to include both gnp and income .... whatever income might mean. Depending on your theory gnp and income could simply be the same.
Edit: I agree with Jesper. Why are you including both income and GDP? What's the motivation? You should also consider adding interest rates which is quite common in the literature.

Edit: percentages and dollars doesn't matter. Just natural log all the variables and they'll be of a similar scale (100% standard to log these variables in cointegration analysis).

You don't want OLS with these variables. First, make sure that your GDP and Income data are in current prices / real prices. Your data is in this form so that's good. This is important since a sufficiently powered statistical test will always detect a relationship between inflation and any nominal price series since inflation constitutes a portion of the variance of this series. You might also consider adding the stock market levels to the set of variables if it aligns with your research question.

These variables are non-stationary and most likely cointegrated. You should first do a Johansen test, using lag length determined (by the inconsistent but best for this test) AIC over a VAR in the log-levels of the data. Granger argues against the usage of log-levels for monthly or quarterly economic data but it seems to be accepted practice in order to have comparable scales and limit the extent of uneven growth rates in the levels series.

If you determine that the cointegrating rank is 1 or 2 through the maximum eigenvalue and the trace tests then you want to specify a VECM (with the same lag length as you determine in the preceding step). Estimate this with OLS with the cointegrating vector that corresponds to the largest eigenvalue from step one. Once the VECM is specified you want to look at the statistical significance and magnitude and sign of the error correction term's coefficient estimate in all 3 lines of your VECM system. This will tell you which variable is responsible for adjusting and then correcting any disequilibriums.

Important: Include an unrestricted constant in the restricted VECM model such that you're accouting for the growth properties in your GDP and income variables. Time trends are not needed but are sometimes used in the literature.

If you determine that your cointegrating matrix is full rank then something has gone wrong since these variables are definitely I(1). I still haven't figured out what this means.

Once you've done this it's good to look at the forecast error variance decompositions and the impulse response functions of the VECM. This will tell you the speed and direction of the impulse response of one series to an unexpected unit or standard deviation shock in another series. It may tell you that one variable in the system does not react to a shock to another variable which is also an interesting finding.

If you don't detect a cointegrating relationship in step one, you should simply specify a VAR differences and conduct impulse response analysis. You can not estimate IRFs with a VARL when the variables aren't cointegrated because the IRF will not be root-n-consistent and will converge to a random number instead of the population value; also the finite sample properties are not pretty (however one working paper suggests it's fine for < 10 time steps). Alternatively, you should apply Johansen (2000) and Joyeux (2007) and specify an exogenous break in the trend or levels of the cointegrating equation or unrestricted constant. If there is no gigantic financial crisis or other "obvious" exogenous shift you should determine it endogenously with Luktepohl, Saikkonen and Trenkler (2004). Another way would be to go back to Engle-Granger (1987) and use the Gregory and Hansen (1996) method for detecting an endogenous level shift. Alternatively you could use the Seo (2006) test for a null of no linear cointegration against an alternative of threshold cointegration. Note that even if you find linear cointegration in step 1 it's good to use Hansen and Seo (2002) to test for a null of linear cointegration against an alternative of threshold cointegration such that you're most accurately modelling the true data generating process.

Once you've done this it's interesting to look at the "causal" properties of the system through Granger causality and instantaneous causality. Both of these are tests that look at whether the MSE of an optimal prediction of one series is improved by adding the next-period observation of an alternative series to the information set (instantaneous causality) or alternatively the entire time series of an alternative series to the information set (Granger causality). To test for Granger causality it's by far the best to apply Toda and Yamamoto (1995), where you run a VAR in levels of order p+m+l, where p is the optimal SBIC VAR-levels (VARL), m is the maximum order of intergration of any of the variables (i.e., 1 in the data that you're using), and l is the additional lags needed to ensure that the residuals are not serially correlated by an LM test. This process ensures that a Wald statistic on the test that p+l lags of one series is asymptotically Chi-squared distributed (and it also has nice finite sample properties). So once you reject the null hypothesis you have determined Granger causality.

For instantaneous causality look at the causality(.) function in the vars package.

Advanced modelling options that you may be interested (depending on the data, theory and research question) in include time-varying cointegration using Chebyshev polynomials ( for which Matlab and GAUSS code is available; granger causality with structural breaks; non-linear granger causality; controlling for exogenous variables; applying pre-processing on the time-series before you do any analysis using signal processing methods (e.g., extracting a cyclical I(0) component from an I(1) trending variable); markov switching vecms; structural vec/vars; having the regime state be determined by an exogenous variable; smooth transition var/vecs.
Last edited:
My regression is actually to do with the determinants of inflation. im going to have cpi as my dependent and GNP Income Oil price Interest rates unemployment and (M2) money stock.
I plan on running a unit root test ( dicky fuller test) . if my results are non stationary ill get the first difference of them and then run an old on that. I also plan to check for co integration using Engle-Grange .

Does this all sound ok or am i doing it in the wrong order. and just to confirm unemployment and intrest rates are percentages so to get rid of that issue should i log everything. ive been told by my mentor to stay away from var/vec but what do you recommend
I think in general you should run OLS at the beginning of the study to get a preliminary look at the short run relationship, even if your plan is to use var/vec.

Definitely your mentor is much more experienced that I am (I am still a student). But I cannot understand what they would mean by staying away from var/vec. In most applications with these variables, the OLS is more limited than a vecm or a VAR. The only reason I see to leave VAR/VECs aside is if ALL you're interested in is the contemporaneous relationship.

If you want to get a complete picture of the system, then use VARs/VECMs. It is very rare (indeed false for these variables) that economic variables' relationship can be summarized well by their instantaneous linkages. Just look at the impulse response functions of already existing studies that look at these variables.

To conclude, here's four reasons why you would leave aside VARs/VECs:
- If you're a beginner and he's/she's just teaching you the basics of OLS then ignore this correct way to proceed and just do what he/she says.
- If the theory you're testing only concerns the contemporaneous relationship.
- He's/she's wrong but you don't want to insult them.
- They have a comparable methodology that they will explain to you after you get out the preliminary OLS results (e.g., directed acyclic graphs, spectral analysis, mutual information).

Two other comments:
-> You shouldn't use Engle-Granger. Use Johansen to check for cointegration. This is better for > 2 variables since it allows for more than one possible cointegrating vector. Also in my research with similar variables I've found that Johansen is a much more statistically powerful test (i.e., can detect more cointegrating cases for more countries). It is also much better accepted in the literature and it would be difficult to publish with Engle-Granger-Yoo for more than a bivariate case.
-> Why would you test for cointegration if you're not going to use VARs/VECMs?
Last edited:
Hi im constructing a vecm regression and need help interpreting the results. it would be much appreciated if you could helps me

I'm testing determinants of inflation, I've constructed unit root test along with co-integration test and am now struggling to understand my vecm results.

my dependent varible in cpi so in my vec model should i only be looking at the 1st error correction model labled d(cpi)on the horizontal axcis and using the coefficients of the varibles under that heading.

also using the standerd 2 lags and I don't understand why the coefficient sign changes for a variable with a lag, for example d(OIL(-1)) 0.0376 which i believe makes economic sense but d(OIL(-2)) = -0.006626 which from what i know goes against economic theory. and this trend happen for most my variables.

Also do these coefficients represent speed of adjustment in order to restore long run equilibrium.

Another question is what do the coefficients under the cointEq1,2,3 represent e.g unem(-1) 3.16834

hear is some of the model

Vector Error Correction Estimates
Date: 01/26/13 Time: 02:12
Sample (adjusted): 1970Q4 2011Q4
Included observations: 165 after adjustments
Standard errors in ( ) & t-statistics in [ ]

Cointegrating Eq: CointEq1 CointEq2 CointEq3

CPI(-1) 1.000000 0.000000 0.000000

INT2(-1) 0.000000 1.000000 0.000000

M2(-1) 0.000000 0.000000 1.000000

OIL(-1) -0.709087 0.012550 -9.81E+10
(0.13954) (0.02878) (1.3E+10)
[-5.08153] [ 0.43599] [-7.60147]

RGDP(-1) -9.08E-12 9.72E-13 -0.410964
(1.0E-12) (2.1E-13) (0.09211)
[-9.11989] [ 4.73313] [-4.46179]

UNEM(-1) 3.168342 -0.348349 3.94E+11
(1.18834) (0.24513) (1.1E+11)
2.66620] [-1.42110] [ 3.58698]

C 11.66774 -12.45975 2.93E+11

Error Correction: D(CPI) D(INT2) D(M2)

CointEq1 -0.016302 -0.021186 -1.67E+09
(0.00716) (0.01332) (5.4E+08)
[-2.27662] [-1.59089] [-3.09419]

CointEq2 0.109832 -0.070831 6.00E+08
(0.02082) (0.03871) (1.6E+09)
[ 5.27618] [-1.82968] [ 0.38287]

CointEq3 1.87E-13 1.18E-13 0.007580
(8.8E-14) (1.6E-13) (0.00663)
[ 2.12164] [ 0.71843] [ 1.14339]

D(CPI(-1)) -0.003501 0.174649 2.28E+10
(0.11985) (0.22289) (9.0E+09)
[-0.02921] [ 0.78357] [ 2.52581]

D(CPI(-2)) -0.392062 0.584066 -1.65E+09
(0.12175) (0.22641) (9.2E+09)
[-3.22027] [ 2.57965] [-0.18012]

D(INT2(-1)) 0.003713 0.157560 -1.21E+10
(0.04913) (0.09137) (3.7E+09)
[ 0.07557] [ 1.72445] [-3.28008]

D(INT2(-2)) -0.019054 -0.373610 -5.89E+09
(0.05211) (0.09690) (3.9E+09)
[-0.36567] [-3.85552] [-1.50139]

D(M2(-1)) 1.52E-12 -2.24E-12 0.216682
(1.2E-12) (2.3E-12) (0.09364)
[ 1.22494] [-0.96690] [ 2.31393]

D(M2(-2)) -5.07E-12 -2.04E-12 0.269365
(1.4E-12) (2.5E-12) (0.10275)
[-3.71738] [-0.80346] [ 2.62153]

D(OIL(-1)) 0.037611 -0.011235 -2.99E+09
(0.01018) (0.01893) (7.7E+08)
[ 3.69586] [-0.59365] [-3.90661]

D(OIL(-2)) -0.006626 -0.054824 1.90E+09
(0.01060) (0.01971) (8.0E+08)
[-0.62535] [-2.78221] [ 2.38376]