How to deal with missing values at the end of a time-serie?

#1
Hi everyone!

I have some problems concerning missing values in time-series data.
I have 11 values for each subject, with values being hormone dosage at different time points (0, +40, +50, +55...). Some of these values are missing. I want to impute data and I have to use univariate, non equi-spaced, time-series imputation methods. I tried the "zoo" package, with na.approx() and na.spline() functions.
The na.approx() replaces NA by linear interpolation while na.spline() replaces NA by cubic spline interpolation (corresponding to a polynomial function). Theorically, spline interpolation appears more adequate because I have the hypothesis of an increase of hormone concentration around the 6th sample followed by a decrease. However the na.spline() function replaces some of my NAs by negative values, which is impossible for my type of data (biological dosage).
On the other hand, na.approx() replaces my NAs by realistic values, but fails to replace the NAs if they are the last in my time-series.

Here is an example of what I have done:
Code:
cort.data2 <- c(2.34, 1.5, NA, NA, NA, 2.57, 3.53, 3.63, NA, NA, NA)
cort.time2 <- c(0, 43, 49, 54, 59, 69, 74, 81, 95, 110, 125)
sj02AM <- zoo(cort.data2, cort.time2)
na.spline(sj02AM, na.rm = FALSE)
na.approx(sj02AM, na.rm = FALSE)
With cort.time = time in minutes of the sample (baseline at 0min), and corti.data = the concentration of my hormone of interest (not supposed to be below 0).

Results are:
For na.spline(): 2.34 1.50 1.25 1.20 1.36 2.57 3.53 3.63 -5.14 -35.93 -99.09
For na.approx(): 2.34 1.50 1.75 1.95 2.16 2.57 3.53 3.63 NA NA NA

I tried to look at some other packages but they do not seem to be suitable for non equi-spaced time-series. Also, I do not have a background in statistics or mathematics, so I don't think I am able to construct my own function in R...

Thank you for you help
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Cool problem. What is the final purpose of your project? What will you do with these data?

How much missingness is there?

Also, typically with missingness people recommend multiple imputation to address the uncertainty in imputes. Not sure how to put constraints on splines by that may be the best route. May there may be a Bayesian version that and put weight on positive values.
 
#3
From these data I want to calculate the area under the curve of each participant (with respect to the ground and with respect to the increase), then average them to compare 2 groups of participants.

At the moment, of the 25 participants included, 6 have at least one missing data, including 2 with missing data at the end of the series.

Multiple imputations seem indeed a good suggestion, thank you.
However, I can't find a way to perform them on un-equispaced time-series data so far. Do you know a way or a package able to compute this?
 
#5
I'm not familiar with area under the series. Here, I want to calculate the AUC of each of my participant. I have the feeling that it is the same thing, except that my data correspond to a time-series. It should only be different for the imputation of missing data, and not for AUC calculation, because I can't use classical imputation methods such as imputation by the mean or by the last observation carried forward.