Is it best to match data units to get a logical regressional relationship?

#1
Hi,

I'm trying to find an economic relationship between inflation(CPI) and leakages to the economy like Taxes, Savings, and Imports. Well, it turns out imports has the right sign, it being negative, but Savings and Taxes are showing a positive relationship with CPI. How can that be? Isn't more money taken away from the economy likely to disinflate the economy? Here are my data and results. All data is an index. Does it matter if data is not an index? I chose to index them to match the CPI units. I did non-index units and I still had the same relationships.



My data

72.600 2.90 879.2 751.0 140.3
82.400 3.10 960.1 701.1 150.5
90.900 3.35 1100.5 719.7 241.1
96.500 3.35 1043.1 710.6 234.5
99.600 3.35 1061.2 800.2 438.3
103.900 3.35 1147.4 995.1 473.2
107.600 3.35 1241.8 1059.6 515.8
109.600 3.35 1293.5 1150.1 585.5
113.600 3.35 1466.8 1218.2 640.0
118.300 3.35 1528.8 1266.0 631.6
124.000 3.35 1678.7 1321.8 589.1
130.700 3.80 1736.4 1369.1 624.0
136.200 4.25 1719.4 1367.0 658.9
140.300 4.25 1784.9 1463.0 759.7
144.500 4.25 1928.0 1589.4 804.9
148.200 4.25 2112.9 1778.9 805.2
152.400 4.25 2288.7 1921.3 743.4
156.900 4.75 2525.6 2088.4 825.8
160.500 5.15 2791.9 2369.6 895.3
163.000 5.15 3023.7 2646.7 1005.3
166.600 5.15 3234.5 2915.0 1135.8
172.200 5.15 3541.5 3294.5 1203.9
177.100 5.15 3379.8 3201.1 1423.4
179.880 5.15 2904.9 3318.4 1710.2
183.960 5.15 2898.9 3466.6 2016.2
188.900 5.15 3123.2 3862.0 2258.2
195.300 5.15 3744.7 4106.6 2368.2
201.600 5.15 4214.0 4366.2 2429.9
207.342 5.85 4427.0 4476.3 2589.7
215.303 6.55 4098.9 4361.7 2677.9

I've tried everything from double log to square roots, since the graph seems to show a bit of a curve, but nothing can change the signs of Savings and Taxes.
 
Last edited:
#2
Here are my regression results in case the picture doesn't work:

|_SAMPLE 1 30
|_READ CPIU MWAGE TAXES IMPORT SAVING
5 VARIABLES AND 30 OBSERVATIONS STARTING AT OBS 1

|_GENR LCPIU=LOG(CPIU)
|_GENR LTAXES=LOG(TAXES)
|_GENR LMWAGE=LOG(MWAGE)
|_GENR LSAVING=LOG(SAVING)
|_GENR LIMPORT=LOG(IMPORT)

|_OLS LCPIU LTAXES LIMPORT LSAVING

REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 4000
OLS ESTIMATION
30 OBSERVATIONS DEPENDENT VARIABLE= LCPIU
...NOTE..SAMPLE RANGE SET TO: 1, 30

R-SQUARE = 0.9833 R-SQUARE ADJUSTED = 0.9813
VARIANCE OF THE ESTIMATE-SIGMA**2 = 0.15888E-02
STANDARD ERROR OF THE ESTIMATE-SIGMA = 0.39859E-01
SUM OF SQUARED ERRORS-SSE= 0.41308E-01
MEAN OF DEPENDENT VARIABLE = 4.9352
LOG OF THE LIKELIHOOD FUNCTION = 56.2505


VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL
NAME COEFFICIENT ERROR 26 DF P-VALUE CORR.
LTAXES 0.50949 0.8958E-01 5.688 0.000 0.745 0.8657 0.7872
LIMPORT -0.19473 0.8823E-01 -2.207 0.036-0.397 -0.4125 -0.2959
LSAVING 0.20329 0.3197E-01 6.359 0.000 0.780 0.5559 0.2753
CONSTANT 1.1518 0.1821 6.324 0.000 0.778 0.0000 0.2334
|_STOP
TYPE COMMAND
 
#3
This exercise needs thinking through again. The problem is that all the variables are closely and positively related statistically to CPI through inflation. As CPI increases over time, all the other variables increase along with it so Imports is actually positively correlated with CPI. Transforming the data won't help.
Somebody may have a suggestion on how to get what you want out of this data. Perhaps you could put exponential curves through them all and see if you can relate the residuals. When CPI is higher than expected, are imports consistently higher or lower?
 
#4
Hm, so you are saying that they are positively related? Hm, maybe my model is not wrong after all except for imports. I know there's a thing called imported inflation when the supply of domestic currency increases in the foreign exchange market. I think that's how it works. Yeah, I looked at the graph and it already shows a somewhat linear relationship already between the various variables and dependent variable, so transforming it might seem unnecessary.

I've tried the exponential function but it didn't work on the data, so I tried a quadratic function and it gave me higher p-values. So it could be true, transforming seems unnecessary at this point. Maybe interaction terms would work? But I get a negative answer when I multiply Taxes and Savings together and all low p-values, and Imports is still negative. When I do a linlog transformation though, all are showing a positive relationship but import has a p-value of .3.
 
#5
It seems like your data are integrated so that you will get spurious relationships. Then it is common to take the first difference. Then maybe the series will be stationary. Inflation, by the way, is the difference, or rather the percentage change in consumer price index, not the index itself.

Are you not missing some relations like:

Y = C + I + G + X - M

That is how I believe they commonly write it in elementary economics books. To exclude important relationships can bias the estimates.

What time period and country are the data from?
 
#6
Hm, I did leave out the other variables like the injections into the economy cause I just wanted to find the negative relationships between inflation and the leakages to the economy. If I include spending would it be better to do 2SLS rather than OLS? Or can I still use the OLS model? I tried taking the first difference of the dependent variable and my R-Square went to .5, but does R-Square matter in this case for First Differences? But my Rho got lower and Durbin Watson got higher which is good I got rid of some serial correlation. But then I kept the linlog functional form. Is it okay to mix transformations with First Difference? It keeps my p-value low than when I take away the log transformations. Here were my results:

|_SAMPLE 1 30
|_READ CPIU MWAGE TAXES IMPORT SAVING
5 VARIABLES AND 30 OBSERVATIONS STARTING AT OBS 1

|_GENR LTAXES=LOG(TAXES)
|_GENR LSAVING=LOG(SAVING)
|_GENR LIMPORT=LOG(IMPORT)
|_GENR CPIU1=LAG(CPIU,1)
..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO
|_GENR FDCPIU=CPIU-CPIU1

|_OLS FDCPIU LTAXES LIMPORT LSAVING/LIST

REQUIRED MEMORY IS PAR= 6 CURRENT PAR= 4000
OLS ESTIMATION
30 OBSERVATIONS DEPENDENT VARIABLE= FDCPIU
...NOTE..SAMPLE RANGE SET TO: 1, 30

R-SQUARE = 0.5725 R-SQUARE ADJUSTED = 0.5231
VARIANCE OF THE ESTIMATE-SIGMA**2 = 74.261
STANDARD ERROR OF THE ESTIMATE-SIGMA = 8.6175
SUM OF SQUARED ERRORS-SSE= 1930.8
MEAN OF DEPENDENT VARIABLE = 7.1768
LOG OF THE LIKELIHOOD FUNCTION = -105.035


VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL
NAME COEFFICIENT ERROR 26 DF P-VALUE CORR.
LTAXES -64.357 19.37 -3.323 0.003-0.546 -2.5558
LIMPORT 87.298 19.07 4.577 0.000 0.668 4.3227
LSAVING -34.079 6.912 -4.930 0.000-0.695 -2.1781
CONSTANT 71.064 39.38 1.805 0.083 0.334 0.0000
OBS. OBSERVED PREDICTED CALCULATED
NO. VALUE VALUE RESIDUAL
1 72.600 44.346 28.254 I X
2 9.8000 30.287 -20.487 * I
3 8.5000 7.7297 0.77026 *
4 5.6000 11.012 -5.4122 * I
5 3.1000 -1.0425 4.1425 I *
6 4.3000 10.350 -6.0499 * I
7 3.7000 7.8066 -4.1066 * I
8 2.0000 8.0169 -6.0169 * I
9 4.0000 1.9139 2.0861 I*
10 4.7000 3.0597 1.6403 I*
11 5.7000 3.1793 2.5207 I*
12 6.7000 2.1123 4.5877 I *
13 5.5000 0.75691 4.7431 I *
14 4.1000 -0.57538 4.6754 I *
15 4.2000 -0.27401 4.4740 I *
16 3.7000 3.6527 0.47254E-01 *
17 4.2000 7.9531 -3.7531 * I
18 4.5000 5.3124 -0.81235 *
19 3.6000 7.1350 -3.5350 * I
20 2.5000 7.7073 -5.2073 * I
21 3.6000 7.6400 -4.0400 * I
22 5.6000 10.504 -4.9039 * I
23 4.9000 5.2933 -0.39333 *
24 2.7800 11.924 -9.1443 * I
25 4.0800 10.262 -6.1821 * I
26 4.9400 11.032 -6.0920 * I
27 6.4000 3.0924 3.3076 I *
28 6.3000 -0.31623E-01 6.3316 I *
29 5.7420 -3.2015 8.9435 I *
30 7.9610 -1.6512 9.6122 I *

DURBIN-WATSON = 1.7635 VON NEUMANN RATIO = 1.8243 RHO = -0.11803
RESIDUAL SUM = 0.14264E-11 RESIDUAL VARIANCE = 74.261
SUM OF ABSOLUTE ERRORS= 172.27
R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.5725
RUNS TEST: 9 RUNS, 15 POS, 0 ZERO, 15 NEG NORMAL STATISTIC = -2.6013
|_STOP
TYPE COMMAND

I shall add in the spending and investment and see if that improves my model. I think the independent variables will be highly correlated.
 
#7
Hm, this time I did a regular linear regression with the only transformation being government spending cause it showed a negative relationship to inflation and got this:

|_SAMPLE 1 30
|_READ CPIU MWAGE TAXES IMPORT SAVING PCE INVEST GSPEND
8 VARIABLES AND 30 OBSERVATIONS STARTING AT OBS 1

|_GENR LGSPEND=LOG(GSPEND)

|_OLS CPIU TAXES IMPORT SAVING PCE INVEST LGSPEND

REQUIRED MEMORY IS PAR= 7 CURRENT PAR= 4000
OLS ESTIMATION
30 OBSERVATIONS DEPENDENT VARIABLE= CPIU
...NOTE..SAMPLE RANGE SET TO: 1, 30

R-SQUARE = 0.9990 R-SQUARE ADJUSTED = 0.9988
VARIANCE OF THE ESTIMATE-SIGMA**2 = 1.8894
STANDARD ERROR OF THE ESTIMATE-SIGMA = 1.3745
SUM OF SQUARED ERRORS-SSE= 43.455
MEAN OF DEPENDENT VARIABLE = 144.66
LOG OF THE LIKELIHOOD FUNCTION = -48.1261


VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED
NAME COEFFICIENT ERROR 23 DF P-VALUE CORR. COEFFICIENT AT MEANS
TAXES -0.23395E-02 0.2273E-02 -1.029 0.314-0.210 -0.0639 -0.0371
IMPORT -0.10365E-01 0.4978E-02 -2.082 0.049-0.398 -0.3310 -0.1544
SAVING -0.13302E-01 0.2625E-02 -5.067 0.000-0.726 -0.2578 -0.0968
PCE 0.54074E-01 0.8743E-02 6.185 0.000 0.790 1.1006 0.5973
INVEST 0.25119E-02 0.2456E-02 1.023 0.317 0.209 0.1169 0.0655
LGSPEND 33.166 4.369 7.591 0.000 0.845 0.4234 1.6795
CONSTANT -152.46 25.82 -5.905 0.000-0.776 0.0000 -1.0539

|_GRAPH CPIU GSPEND

REQUIRED MEMORY IS PAR= 5 CURRENT PAR= 4000
30 OBSERVATIONS
SHAZAM WILL NOW MAKE A PLOT FOR YOU
|_STOP
TYPE COMMAND

Looks like the Savings, Taxes, and Imports are showing a negative relationship to inflation and Consumption, Government Spending, and Investment are showing positive signs despite the p-value for Taxes and Investment being .3. After adding those variables to complete the GDP model is it headed toward the right direction? Keep in mind that I removed the first difference and got a rho of .5 and Durbin Watson H of .7843 being close to a positive serial correlation.