Brownian motion around a sinusoid.

Prometheus

Member
I wish to model arrivals of people into a healthcare queue. The methods I have studied involve assuming that the arrival rate is constant - but the data clearly show this is not true, it shows a sinusoidal pattern.

I was thinking of fitting a sinusoid to the data (using least squares), then adding a Brownian motion around the sinusoid.

But before going an further down this road, I just wondered if this is a reasonable idea?

BGM

TS Contributor
I have not experience with this before. But modelling by Brownian motion with a sinusoidal drift looks interesting.

Prometheus

Member
I've just started working on this, but it's not going well.

To start with I just want to fit a sinusoid to the data i have, this is what i did:

The dataframe looks like this (15000 observations)
Code:
t y
1 1 2
2 2 3
3 3 9
4 4 2
5 5 7
6 6 2
Then i tried fitting a sinusoid like this:

Code:
res <- nls(y ~ A*sin(omega*t+phi)+C, data=data.frame(t,y), start=list(A=13,phi=0,omega=2*pi/24, C=5))
co <- coef(res)
summary(res)
fit <- function(x, a, b, c, d) {a*sin(b*x+c)+d}
plot(x=t, y=y)
curve(fit(x, a=co["A"], b=co["omega"], c=co["phi"], d=co["C"]), add=TRUE ,lwd=2, col="steelblue")
I've attached the graph. The result is extremely poor, especially the amplitude. I've tried different starting values, and insisting omega is a particular value (as i anticipate a daily frequency - interestingly the model estimates this parameter nearly exactly to what i anticipated anyway).

Any tips?

Dason

What makes you think a sinusoid fit is what you want?

Dason

For reference here is a plot of random exponential variates over time. They all have the same parameter.

View attachment 5499

If it looks like there is a non constant trend then that's entirely spurious. When talking about trend you want to downweight the extremes. If you can remove a few of the extreme values and that completely changes what you think about the "trend" then it probably wasn't very strong anyways.

Prometheus

Member
To start with i'm insisting on it based on knowledge of the system. The data is arrivals to an emergency department which should have a distinctly sinusoidal nature following the diurnal nature of humans. (I also anticipate a weekly frequency, but one thing at a time).

I know the data looks like a mess, but also based on this graph (of average number of arrivals per hour) i've attached, i thought a sinusoid reasonable.

Last edited:

Dason

And what does t represent in your original graph

Prometheus

Member
t is hour from the first observation. I originally had the dates and times (to the minute) as Posixct objects in R, but i was having problems trying to fit a non-linear model with them, so i created t.

GretaGarbo

Human
To start with i'm insisting on it based on knowledge of the system.
I certainly agree with that! Therefore I suggest that Prometeus search at google scholar for “arrivals emergency department”, (this one is an example), and look how published papers usually model that.

A Brownian motion is normally distributed, right? Isn't it more natural to use a discrete distribution that is non-negative like the Poisson distribution or the negative binomial?

If you like the sine-model, then start with that! But isn't it true that a model “A*sin(omega*t+phi)+C” = a + b1*sin(t) + b2*cos(t)? I am to lazy to check if it is true with that equal sign. But if that is correct, then that means that you don't need to run non-linear regression, as it can be expressed by a linear model: y = a + b1*x1 + b2*x2 + eps, where x1 = sin(t) and x2 = cos(t).

But since you like the sine model, why not do the Fourier transform? Check the FFT, fast Fourier transform. (Then you will get several sine and cosine terms that will approximate the true function. In principle the number of terms will go to infinity, but maybe two or three terms will give a good approximation.)

We have discussed “circular data” here several times but I am not sure if that would be helpful in this case.

Also with 15.000 observations it is possible to do “dummy coding” for each of the 24 hours, and even for all week, so that (7*24 -1) = 167 dummy variables (and saving one for the intercept).

There can be other seasonal factors over the year. One explanatory factor is heat waves and the presence of influenza and the air pollution level. So there are a lot of published papers for theese risk factors and “arrivals to emergency departments”.

An other common modelling strategy is to use a GAM (generalised additive model, with the mgcv package in R) for the mean intensity of the arrival rate.