Interpolated Rates vs Initial Rates

Hello everyone,

I am currently into my postgraduate degree in Applied Biostatistics and I am trying to figure out an answer to the following issue:

I have been given the below dataset:

Age ----- Mortality Rate
0 ------------ 5%
10 ------------ 8%
20 ------------ 9%
30 ------------ 7%
40 ------------ 11%
50 ----------- 14%
60 ----------- 22%
70 ----------- 28%

I have been asked to perform a cubic spline interpolation in order to find the missing values (mortality rates) for ages 1-9, 11-19, and so on.., until age 69.

I have performed the cubic spline interpolation using R and got the interpolated values for each age.

My question: Is there a way (some kind of a test?) to validate my model? To be more specific, is there some kind of test showing that the results produced are a good fit to the initial rates?

To be honest, I don't even know if my question is correct. Please feel free to ask for any clarification.

Thank you


Active Member
You could run a leave-one-out cross-validation and calculate the relative approximation error. However, such a calculation would be very approximate since it would assume the same approximation precision in different segments of the curve. The assumption would not be very realistic. For example, most curves are harder to interpolate at the ends than in the middle.
Last edited:


Less is more. Stay pure. Stay poor.
I think your approach seems fine if you just needed estimates. I agree with @staassis that given data there isn't a straight for approach to get at the missing values. Even if you partition your data, they are still missing and you are then just interpolating using smaller datasets with LOO probably being a better approach than k-fold CV.