1. ## Predicting the chance of attendance for a given appointment time

Hi

I do work for a health screening programme that screens hundreds of thousands of diabetic patients every year to try to prevent blindness. These programs have been running for a number of years, so there is plenty of historical data to analyse.

I have been trying to think how to maximise the probability of a patient attending a given booked appointment time (and date) at a given location, given all of the historical data that I have.

For example, if my historical data told me that patients that are aged between 40 and 50 are much more likely to turn up for appointments booked after 3:00pm, then I would like offer them those appointment times.

Or perhaps patients from certain postcodes are more likely to turn up on certain days of the week.

I know that trends like this are present in the data - for example in some towns, older patients have bus passes that make public transport free in the afternoon, and this makes them much more likely to turn up in the afternoon.

So I guess the problem is that given a patient with a set of demographics/historical booking record, lets say:

- Date of birth x
- Post code y
- Years since diagnosis of diabetes z
- Registered GP
- Distance from home to location
- and perhaps some others

what is the probability that they will attend an appointment with the following variables:

- Date d
- Time t
- Location l

Given my historical data.

How would the probability change as I vary the date, time and location of the appointment?

If I could somehow work out these probabilities, I hope I could push up attendance and potentially save the sight of many people.

(Once I understand this a bit better, my absolute ideal would be for a computer to somehow be able to calculate an optimal appointment arrangement for a given set of patients given the historical data and their demographics, but lets park this for now).

My mathematical level is did UK A-Level Maths and got good grades, did Engineering degree (which was more calculus than probability), but then forgot it all!

I would really appreciate a few pointers, and I apologise if this is completely the wrong place to ask something like this.

Thank you!!

In general this can be considered as a regression problem.

You may consider the logistic regression model as your response is a probability.

Very interesting. The first step and hurdle with most problems is getting the data and having a sufficient sample size. Fortunately you already have your data and question. Now comes how to use that data. From what you have written you need to start exploring the concept of regression analysis and once you start to get comfortable, work you way to hierarchical regression. This may take quite a bit of work and experience. I would also recommend contacting the closes major university (statistics or biostatistic department). This may be a great project for a biostatistics or epidemiology student to perform (as a thesis and potentilally free for you). You should also attempt to learn these concepts along the way as well for future sustainability for project.

I agree with BGM that it sounds like a logistic regression problem. You could simply code 1=a patient arrived for their appointment and 0=they did not. Then using the variables you described (age, day of week, month...and so on) you could model the probability that someone of X years of age on Monday in March.....would show up.

Thanks for your kind responses. Sounds like I need a crash course in logistic regression... (anyone know a decent web based tutorial before I get Googling?)

I am based right next to the University of Cambridge, but it feels a bit like getting someone else to do it is the easy way out!

I think I would prefer to try myself - I hope I will really enjoy the journey.

What stats programs do you have or feel comfortable using? SAS, R, SPSS....?

I don't have any stats tools aside from Microsoft Excel and a lot of C/C#/C++ programming. Do I need to get one?

