+ Reply to Thread
Results 1 to 10 of 10

Thread: Imputing missing values thanks to correlated data ?

  1. #1
    Points: 138, Level: 2
    Level completed: 76%, Points required for next Level: 12

    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Imputing missing values thanks to correlated data ?




    Hello everyone,

    I'm new here, nice to meet you As English is not my native language, I apologise if I'm not always clear, please don't hesitate to ask me to repeat in that case.

    There is a question I would like to ask you :
    Suppose I have a statistical series corresponding to the different values of a variable through time. I have some missing values in this series, and I would like to estimate these values.
    Now, I have a second series corresponding to a second variable. This one is complete (I have all of the wanted values). This variable is strongly correlated to the first one (R close to 1, p-value<0.001).

    Is it possible that I estimate the missing data of the first variable thanks to the data corresponding to the other variable ?
    I should specify that although those two variables are very correlated, I can't really assume that one explains the other (so, I can't use linear regression, unless I'm wrong).

    I hope that my question is clear and I thank you in advance.

  2. #2
    Points: 657, Level: 13
    Level completed: 14%, Points required for next Level: 43

    Posts
    57
    Thanks
    1
    Thanked 9 Times in 8 Posts

    Re: Imputing missing values thanks to correlated data ?

    Missing data analyses is complicated.

    It all depends on how your data is missing (i.e. missing completely at random, missing at random, missing not at random). How much of the data is missing. The type of data and what you plan to do with the data once imputed?

    If the two variables are so highly correlated (assuming this high correlation is not the result of missing data), why can't you use the complete variable instead for your analyses?

  3. #3
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Imputing missing values thanks to correlated data ?

    I will echo evelyn13's comments, it depends on why they are missing. Look up monotonic missing pattern - does this seem applicable given it is time series data?
    Stop cowardice, ban guns!

  4. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Imputing missing values thanks to correlated data ?

    In the context of time series its often stated that you can not miss any points in the series analyzed unlike regular data. Issues like MAR and MCAR don't apply to it, perhaps because of the issue of seasonality.

    Interestingly in analysis of missing data I have not seen whether time series has any impact on how you fix the data, replace missing values.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #5
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Imputing missing values thanks to correlated data ?

    There are slightly different approaches in time series I believe (last observation carried forward, nearest neighbor, etc.). Also, depending on your data, mixed models can function with missing data, though that approach could be less than ideal if data are MAR or MNAR.
    Stop cowardice, ban guns!

  6. #6
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Imputing missing values thanks to correlated data ?

    There are many approaches to replacing time series missing data. I tend not to think of that as replacing missing data in the MI context, the logic appears to be different. I don't think these approaches even consider the issue of MAR or MCAR. I suspect they feel time series data when missing is entirely accidental [it just was not gathered that period] so all such data is MCAR.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  7. #7
    Points: 138, Level: 2
    Level completed: 76%, Points required for next Level: 12

    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Imputing missing values thanks to correlated data ?

    Thank you for all your answers. Indeed the data are MCAR, if I'm not mistaken. Actually, they correspond to daily mean temperature values for a long period of time (several years), on different sites. On some sites, data is missing some months ot some years because the device to measure temperature was not active. Most of the time these are the first years which are missing (so, I'd say this is a monotone missing pattern ?). I'd say that, in total, about 1/6 data are missing.

    The "complete" variabe corresponds to, kind of, a temperature estimator less precise than the daily mean temperature measures. What I'd like to do is to explain a third variable with the daily mean temperature variable. So, I guess I could explain this other variable with the variable that is "complete", but this complete variable does not really correspond to my hypothesis (and it tends to overestimate daily tempeature, though it is well correlated to it). Actually I don't know, maybe keeping this complete variable as an explanatory variable would be the best option. But my supervisors asked me to estimate the "real" temperature variable (I'm a master's student intern), maybe I can discuss this with them. But I can't do my analysis with missing data because I'm using a specific statistical tool where all the data is needed to be present (at least at several periods within years).

    I thought too about nearest-neighbor methods, but doesn't this mean I could not use the information given by the correlated variable ? I'll totally take it if you have some ideas/advices, I admit I'm only a beginner But maybe the idea of keeping the complete variable remains the best ?

  8. #8
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Imputing missing values thanks to correlated data ?

    You are learning the joy [aka pain] of working with real data in the real world - where commonly there are no easy and or good solutions.

    Multiple imputations, which uses existing variables the missing data are correlated with to create values for the missing data might work. I have not seen this used with time series although I assume it does. You might look this up and see if it deals with your problem. Its not the simplest of methods although if you are a master's student it will be useful to learn.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  9. #9
    Points: 657, Level: 13
    Level completed: 14%, Points required for next Level: 43

    Posts
    57
    Thanks
    1
    Thanked 9 Times in 8 Posts

    Re: Imputing missing values thanks to correlated data ?

    My apologies, I read the OP in a rush and completely missed that it was time series data.
    Out of interest what analyses are you planning on using?

  10. #10
    Points: 138, Level: 2
    Level completed: 76%, Points required for next Level: 12

    Posts
    8
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Imputing missing values thanks to correlated data ?


    Thank you for your kind answers !!!

    Sorry for answering so late. I'm planning to use a regression analysis or a time series analysis to interpret the effects of these climatic covariates (in interaction with other ones) on biological variables. My objective is not really to predict the times series, but more to explain the links between the variables. I still don't really know how I'm going to do this however, I'm still thinking about it

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats