1. ## Lagged Dependent Variable

First, what is a lagged dependent variable? Is it the figure given in the previous year, say consumption today includes consumption of yesterday?

Second, how do we make a lagged dependent variable part of a multiple regression in R?

Third, if we can make it part of the lm model then does it mean that there is also a corresponding coeffiecient for it when we code coefficients(lmfit)?

Thanks.

2. ## Re: Lagged Dependent Variable

Suppose your dependent variable is consumption. As you've said, if consumption today has an effect on the consumption in future time points, then there will be correlation in the observed values of consumption (called autocorrelation). In order to reduce this auto correlation, the lagged values can be fitted to the model.

In R, there is a package called "dyn" which does this.

Code:
require(dyn);

# example data
data<-structure(list(y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L,
34L), x = c(3L, 4L, 2L, 4L, 2L, 5L, 2L, 4L, 5L)), .Names = c("y",
"x"), class = "data.frame", row.names = c(NA, -9L))

y x
1 34 3
2 24 4
3 35 2
4 53 4
5 24 2
6 68 5
7 86 2
8 73 4
9 34 5

# Specify time series proporties
y_1 <- ts(y)
x_1 <- ts(x)

# Fit lagged variables as an explnanatory variables
m1<-dyn\$lm(y_1 ~ x_1+lag(y_1, -1))
summary(m1)

Call:
lm(formula = dyn(y_1 ~ x_1 + lag(y_1, -1)))

Residuals:
2       3       4       5       6       7       8       9
-6.882  -3.674  10.072 -22.003  11.005  30.310  14.235 -33.062

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   24.3952    22.8796   1.066    0.335
x_1            6.1071     6.2471   0.978    0.373
lag(y_1, -1)  -0.1685     0.6409  -0.263    0.803  # coeff for lag

Residual standard error: 24.42 on 5 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2526,     Adjusted R-squared: -0.04633
F-statistic: 0.845 on 2 and 5 DF,  p-value: 0.4829

#----------------------------------------------------------------------------------------------------------------------------------------------
# not a very good example/model as R-squared is negative. A non-lagged linear model could have done better I think (for this not so good example). In fact, can test it using simple F test.

# non-lagged
m2<-dyn\$lm(y_1 ~x_1)
summary(m2)

# compare m2 against m1 (m1 nested within m2)
anova(m2,m1)

> anova(m2,m1)
Analysis of Variance Table

Model 1: y_1 ~ x_1
Model 2: y_1 ~ x_1 + lag(y_1, -1)
Res.Df    RSS Df Sum of Sq      F Pr(>F)
1      6 3023.2
2      5 2982.0  1    41.195 0.0691 0.8032

3. ## The Following User Says Thank You to ledzep For This Useful Post:

dEconomist (03-05-2012)

4. ## Re: Lagged Dependent Variable

Thank you so much.

How can I include the lagged dependent variable of my existing formula:

lmfit1<-lm(Data1\$C~Data1\$Y+qtr)

where Data1\$C is the column for consumption of a data frame, while Data1\$Y is the column for personal disposable income, qtr is the 4 dummy variables.

Do I have to put L beside of the figures like what you have y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L,
34L)?

5. ## Re: Lagged Dependent Variable

Originally Posted by dEconomist
There must be, but I am not too sure of.

Do I have to put L beside of the figures like what you have y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L,
34L)?
No. You don't have to worry about those L. This is how R stores numbers internally. So, you shouldn't worry about it at all.

How can I include the lagged dependent variable of my existing formula:

lmfit1<-lm(Data1\$C~Data1\$Y+qtr)
So, you have got 4 time quarters . In this case, you would want to add separate lags at each quarter?
I am afraid but I am not sure how to add different lags for different quarters (and I don't want to give you a vague answer). We never went beyond a simple one page example in our course. Wish more of it was covered.

6. ## Re: Lagged Dependent Variable

I've never used the dyn package, and I think having your data in a time series object (ts) has ways of making it easier to do these sorts of regressions. In any case, I'm not going to sit here and try to explain the whole theory behind autocorrelation (and I hope you already know multiple regression). The basic idea, though, is that you literally put a variable (on the common approach) into your model that is the prior year(s). This modifies your error, though, because now the error depends on previous years (the algebra isn't that hard, though).

I've not done too many of these in practice, but when playing around with lagged variables in R, I usually just use sequences in a convenient way that models our syntax.

For instance, suppose my dependent variable is 'y' and has length n (length(y) == n # reports TRUE). Then I make myself an index

Code:
t <- 2:n
Why did I choose 2? Because a lag is always that much less than your full size n. This also makes it convenient to deal with the sequence 1, ..., n-1. All I have to do for that sequence is look at t-1. R handles the vector algebra by subtracting 1 from each element. In other words, t is 2:n and t-1 is 1n-1). This gives us our current series y[t] and our lagged series y[t-1]. Nice syntax, right? So now I fit my lagged model with something like

Code:
fit <- lm(y[t] ~ x[t] + qtr[t] + y[t-1], df)
There's actually a function that does this 't' variable for you in the sense you can specify the lag you want on a variable. I believe it's the diff function (see ?diff). The problem is that it's useful for a given variable, but controlling 't' like I do lets me easily supply it to my other vectors. I can also use it to apply to the dataframe itself. In this respect, I might do something like

Code:
fit <- lm(y ~ x + qtr + lagy, cbind(df[t, ], lagy = df\$y[t-1]))
Here I am returning only the t-row subset of df and creating the lag variable (so named as it is used) on-the-fly. In this respect, it might be useful to use diff.

7. ## The Following 2 Users Say Thank You to bryangoodrich For This Useful Post:

dEconomist (03-05-2012), ledzep (03-04-2012)

8. ## Re: Lagged Dependent Variable

Thaaaaankies again!!!!

To bryangoodrich:

You are so amazing to read what my mind needs. Thanks >.<

9. ## Re: Lagged Dependent Variable

Allow me to do a follow-up question:

Aside from literally encoding the lagged variables, is there a way that R will print it? Like when you taught me about letting R do the dummy variable, using model.matrix(~Data2+qtr-1) will print the dataframe along with the dummy variables as additional columns of my dataframe.

If so may I know the codes?

Thanks.

10. ## Re: Lagged Dependent Variable

On any regression, you can use model.matrix to return your X matrix used in the regression Y ~ Xb. Another useful method is model.frame that returns the data frame used in the regression.

11. ## The Following User Says Thank You to bryangoodrich For This Useful Post:

dEconomist (03-05-2012)

12. ## Re: Lagged Dependent Variable

I see.

Thaaankieees, bryangoodrich!

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts