+ Reply to Thread
Results 1 to 9 of 9

Thread: Lagged Dependent Variable

  1. #1
    Points: 2,132, Level: 27
    Level completed: 88%, Points required for next Level: 18

    Posts
    93
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Lagged Dependent Variable




    First, what is a lagged dependent variable? Is it the figure given in the previous year, say consumption today includes consumption of yesterday?

    Second, how do we make a lagged dependent variable part of a multiple regression in R?

    Third, if we can make it part of the lm model then does it mean that there is also a corresponding coeffiecient for it when we code coefficients(lmfit)?

    Thanks.

  2. #2
    Point Mass at Zero
    Points: 6,886, Level: 54
    Level completed: 68%, Points required for next Level: 64
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    648
    Thanks
    174
    Thanked 132 Times in 130 Posts

    Re: Lagged Dependent Variable

    Suppose your dependent variable is consumption. As you've said, if consumption today has an effect on the consumption in future time points, then there will be correlation in the observed values of consumption (called autocorrelation). In order to reduce this auto correlation, the lagged values can be fitted to the model.

    In R, there is a package called "dyn" which does this.

    Code: 
    require(dyn);
    
    # example data
    data<-structure(list(y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L, 
    34L), x = c(3L, 4L, 2L, 4L, 2L, 5L, 2L, 4L, 5L)), .Names = c("y", 
    "x"), class = "data.frame", row.names = c(NA, -9L))
    
       y x
    1 34 3
    2 24 4
    3 35 2
    4 53 4
    5 24 2
    6 68 5
    7 86 2
    8 73 4
    9 34 5
    
    # Specify time series proporties
    y_1 <- ts(y)
    x_1 <- ts(x)
    
    # Fit lagged variables as an explnanatory variables
    m1<-dyn$lm(y_1 ~ x_1+lag(y_1, -1))
    summary(m1)
    
    Call:
    lm(formula = dyn(y_1 ~ x_1 + lag(y_1, -1)))
    
    Residuals:
          2       3       4       5       6       7       8       9 
     -6.882  -3.674  10.072 -22.003  11.005  30.310  14.235 -33.062 
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
    (Intercept)   24.3952    22.8796   1.066    0.335
    x_1            6.1071     6.2471   0.978    0.373
    lag(y_1, -1)  -0.1685     0.6409  -0.263    0.803  # coeff for lag
    
    Residual standard error: 24.42 on 5 degrees of freedom
      (2 observations deleted due to missingness)
    Multiple R-squared: 0.2526,     Adjusted R-squared: -0.04633 
    F-statistic: 0.845 on 2 and 5 DF,  p-value: 0.4829 
    
    #----------------------------------------------------------------------------------------------------------------------------------------------
    # not a very good example/model as R-squared is negative. A non-lagged linear model could have done better I think (for this not so good example). In fact, can test it using simple F test.
    
    # non-lagged
    m2<-dyn$lm(y_1 ~x_1)
    summary(m2)
    
    # compare m2 against m1 (m1 nested within m2)
    anova(m2,m1)
    
    > anova(m2,m1)
    Analysis of Variance Table
    
    Model 1: y_1 ~ x_1
    Model 2: y_1 ~ x_1 + lag(y_1, -1)
      Res.Df    RSS Df Sum of Sq      F Pr(>F)
    1      6 3023.2                           
    2      5 2982.0  1    41.195 0.0691 0.8032
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  3. The Following User Says Thank You to ledzep For This Useful Post:

    dEconomist (03-05-2012)

  4. #3
    Points: 2,132, Level: 27
    Level completed: 88%, Points required for next Level: 18

    Posts
    93
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Re: Lagged Dependent Variable

    Thank you so much.

    How can I include the lagged dependent variable of my existing formula:

    lmfit1<-lm(Data1$C~Data1$Y+qtr)

    where Data1$C is the column for consumption of a data frame, while Data1$Y is the column for personal disposable income, qtr is the 4 dummy variables.

    Also, is there a way to get through what I need without downloading the package dyn?

    Do I have to put L beside of the figures like what you have y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L,
    34L)?

  5. #4
    Point Mass at Zero
    Points: 6,886, Level: 54
    Level completed: 68%, Points required for next Level: 64
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    648
    Thanks
    174
    Thanked 132 Times in 130 Posts

    Re: Lagged Dependent Variable

    Quote Originally Posted by dEconomist View Post
    Also, is there a way to get through what I need without downloading the package dyn?
    There must be, but I am not too sure of.

    Do I have to put L beside of the figures like what you have y = c(34L, 24L, 35L, 53L, 24L, 68L, 86L, 73L,
    34L)?
    No. You don't have to worry about those L. This is how R stores numbers internally. So, you shouldn't worry about it at all.

    How can I include the lagged dependent variable of my existing formula:

    lmfit1<-lm(Data1$C~Data1$Y+qtr)
    So, you have got 4 time quarters . In this case, you would want to add separate lags at each quarter?
    I am afraid but I am not sure how to add different lags for different quarters (and I don't want to give you a vague answer). We never went beyond a simple one page example in our course. Wish more of it was covered.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  6. #5
    Probably A Mammal
    Points: 17,822, Level: 84
    Level completed: 95%, Points required for next Level: 28
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,155
    Thanks
    286
    Thanked 479 Times in 437 Posts

    Re: Lagged Dependent Variable

    I've never used the dyn package, and I think having your data in a time series object (ts) has ways of making it easier to do these sorts of regressions. In any case, I'm not going to sit here and try to explain the whole theory behind autocorrelation (and I hope you already know multiple regression). The basic idea, though, is that you literally put a variable (on the common approach) into your model that is the prior year(s). This modifies your error, though, because now the error depends on previous years (the algebra isn't that hard, though).

    I've not done too many of these in practice, but when playing around with lagged variables in R, I usually just use sequences in a convenient way that models our syntax.

    For instance, suppose my dependent variable is 'y' and has length n (length(y) == n # reports TRUE). Then I make myself an index

    Code: 
    t <- 2:n
    Why did I choose 2? Because a lag is always that much less than your full size n. This also makes it convenient to deal with the sequence 1, ..., n-1. All I have to do for that sequence is look at t-1. R handles the vector algebra by subtracting 1 from each element. In other words, t is 2:n and t-1 is 1n-1). This gives us our current series y[t] and our lagged series y[t-1]. Nice syntax, right? So now I fit my lagged model with something like

    Code: 
    fit <- lm(y[t] ~ x[t] + qtr[t] + y[t-1], df)
    There's actually a function that does this 't' variable for you in the sense you can specify the lag you want on a variable. I believe it's the diff function (see ?diff). The problem is that it's useful for a given variable, but controlling 't' like I do lets me easily supply it to my other vectors. I can also use it to apply to the dataframe itself. In this respect, I might do something like

    Code: 
    fit <- lm(y ~ x + qtr + lagy, cbind(df[t, ], lagy = df$y[t-1]))
    Here I am returning only the t-row subset of df and creating the lag variable (so named as it is used) on-the-fly. In this respect, it might be useful to use diff.

    Some resources for time series in R: link, link, and link

  7. The Following 2 Users Say Thank You to bryangoodrich For This Useful Post:

    dEconomist (03-05-2012), ledzep (03-04-2012)

  8. #6
    Points: 2,132, Level: 27
    Level completed: 88%, Points required for next Level: 18

    Posts
    93
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Re: Lagged Dependent Variable

    Thaaaaankies again!!!!

    To bryangoodrich:

    You are so amazing to read what my mind needs. Thanks >.<

  9. #7
    Points: 2,132, Level: 27
    Level completed: 88%, Points required for next Level: 18

    Posts
    93
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Re: Lagged Dependent Variable

    Allow me to do a follow-up question:

    Aside from literally encoding the lagged variables, is there a way that R will print it? Like when you taught me about letting R do the dummy variable, using model.matrix(~Data2+qtr-1) will print the dataframe along with the dummy variables as additional columns of my dataframe.

    If so may I know the codes?

    Thanks.

  10. #8
    Probably A Mammal
    Points: 17,822, Level: 84
    Level completed: 95%, Points required for next Level: 28
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,155
    Thanks
    286
    Thanked 479 Times in 437 Posts

    Re: Lagged Dependent Variable

    On any regression, you can use model.matrix to return your X matrix used in the regression Y ~ Xb. Another useful method is model.frame that returns the data frame used in the regression.

  11. The Following User Says Thank You to bryangoodrich For This Useful Post:

    dEconomist (03-05-2012)

  12. #9
    Points: 2,132, Level: 27
    Level completed: 88%, Points required for next Level: 18

    Posts
    93
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Re: Lagged Dependent Variable


    I see.

    Thaaankieees, bryangoodrich!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats