+ Reply to Thread
Results 1 to 4 of 4

Thread: Seeking advise on predictive modeling approach

  1. #1
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Seeking advise on predictive modeling approach




    Hello dear forum members!

    Currently, I am working on a project that aims to predict a certain cancer-related outcome (y) using a number of control (c) and predictor (X) variables:

    y(i) = a + c(it) + X(it) + u (1)

    In Equation (1): y(i) is continuous in nature, data is available only as means of values aggregated from 2009 to 2013; c(it) is a vector of several longitudinal (yearly) control variables available from 2009 through 2013; and X(it) is a vector of several longitudinal (yearly) predictor variables available from 2010 through 2013.

    As you can see, the outcome does not vary over time as it is available only in the aggregated form of means; however the controls and predictors are in the panel form. Facing such a limitation, panel models do not seem applicable. Therefore, my approach is to firstly estimate:

    y(i) = a + c(i) + X(i) + u (2), where c(i) and X(i) are aggregated as means

    And secondly to (a) ensure consistency of the coefficients, and (b) test for lagged effects estimate:

    y(i) = a + c(it-1) + X(it-1) + u (3), where c(it-1) and X(it-1) are from 2012 only
    y(i) = a + c(it-2) + X(it-2) + u (4), where c(it-2) and X(it-2) are from 2011 only
    y(i) = a + c(it-3) + X(it-3) + u (5), where c(it-3) and X(it-3) are from 2010 only

    Please advice if my modeling approach seems plausible (considering the limitation related to DV data).

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Seeking advise on predictive modeling approach

    How does variability of parameters get in model? If it does not, I would imagine SE values may be under-represented and you risk type I errors. In the back of my mind your approach seems like what economist may do. Perhaps one of them can chime in on the pros/cons. Using the means would also not control for the trends within years, so you wouldn't know if it was going up and then down the next year; you would only have the level changes but not the trend changes, but I understand you are trying to do the best with what you have.


    I didn't understand what you were alluding to in the secondary part, looking for autocorrelation?
    Stop cowardice, ban guns!

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    kiton (05-30-2017)

  4. #3
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Seeking advise on predictive modeling approach

    Dear hlsmith,

    Thank you for response and also issues you emphasized. Perhaps, some "pooled" model could be used, e.g.,:

    y(i) = a + c(it) + c(it-1) + c(it-2) + c(it-2) + X(it) + X(it-1) + X(it-2) + X(it-2) + u (6)

    In Equation (6): N(c) = 13, N(X) = 46, and N(obs) = 2,779

    Actual estimation results (obtained via OLS w/robust SE's) are quite intriguing (accepting the limitation that 'y = Mean[2009-2013]'. E.g., consider an attached plot of quantiles of residuals against the quantiles of normal distribution. Evidently, up to a point the model fit is very good (as indicated by residual points forming a straight line) Also, note the attached plot of the residuals against DV: Testing for heteroskedasticity, the test statistic fails to reject null of constant variance (i.e., assumption of homoskedasticity is met). I think further quantile regression analysis seems appropriate.

    As for my initial lagged effects approach, yes, the goal was to ensure robustness of the coefficients, as autocorrelation is present in some controls and predictors.
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	Screen Shot 2017-05-30 at 14.13.55.jpg‎
Views:	4
Size:	13.8 KB
ID:	6654   Click image for larger version

Name:	Screen Shot 2017-05-30 at 14.03.32.jpg‎
Views:	4
Size:	8.3 KB
ID:	6653  
    Last edited by kiton; 05-30-2017 at 03:24 PM.

  5. #4
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Seeking advise on predictive modeling approach


    Just a side note, I read a position piece on why quantiles regression is limited in the published research arena. It said if you use quantiles, the external validity becomes limited, in that others will have different quantiles than you and generalizing your results becomes hinder, unlike say OLS where those results can possibly be interpolated to any pseudo realization of the population (just plug in values). I am not doing the article justice, no reference - sorry.
    Stop cowardice, ban guns!

  6. The Following User Says Thank You to hlsmith For This Useful Post:

    kiton (05-30-2017)

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats