I'm not sure I'm thinking about this correctly, and would be grateful for any insight as I think the problem is analogous to one that comes up in biostatistics.

I have data on payment to attorneys for representing clients (at public expense). I only learn about the existence of the representation upon the attorney submitting a voucher.

That voucher contains the date the attorney was appointed. This is the first I learn of the appointment. So there are numerous appointments going on presently of which I am not aware.

I am being asked to estimate the scope of appointments in a given time period, based on the vouchers (for payment) we have seen submitted to date.

So I can look at the appointment month/year, and then the payment month/year, and develop a sense of how long it takes for the the payment to show up (the "event" so to speak). Most appear within a couple of years of the appointment.

If I make the (substantial) assumption that behavior today is similar to that in earlier years, I would use data from say payments and appointments in 2009-12 to predict how many appointments are truncated from the 2014 payment data (because vouchers are not yet submitted). I would do so by conducting survival analysis on the 2009-12 payment data. Then using those results, I would project the remaining 2014 appointments that will "show up" based on what we see in the 2014 payment data (which is a truncated set of 2014 appointments).

I feel like this is a better approach than simply running an ARIMA forecast or the like with old appointment data or payment data that is a truncated representation of actual appointments.

Having said that, I'm not feeling too comfortable and as I said I would really appreciate any insights.

Thank you for your time, and best wishes all.