I was thinking of fitting a sinusoid to the data (using least squares), then adding a Brownian motion around the sinusoid.

But before going an further down this road, I just wondered if this is a reasonable idea?

- Thread starter Prometheus
- Start date
- Tags gam smoothing

I was thinking of fitting a sinusoid to the data (using least squares), then adding a Brownian motion around the sinusoid.

But before going an further down this road, I just wondered if this is a reasonable idea?

To start with I just want to fit a sinusoid to the data i have, this is what i did:

The dataframe looks like this (15000 observations)

Code:

```
t y
1 1 2
2 2 3
3 3 9
4 4 2
5 5 7
6 6 2
```

Code:

```
res <- nls(y ~ A*sin(omega*t+phi)+C, data=data.frame(t,y), start=list(A=13,phi=0,omega=2*pi/24, C=5))
co <- coef(res)
summary(res)
fit <- function(x, a, b, c, d) {a*sin(b*x+c)+d}
plot(x=t, y=y)
curve(fit(x, a=co["A"], b=co["omega"], c=co["phi"], d=co["C"]), add=TRUE ,lwd=2, col="steelblue")
```

Any tips?

View attachment 5499

If it looks like there is a non constant trend then that's entirely spurious. When talking about trend you want to downweight the extremes. If you can remove a few of the extreme values and that completely changes what you think about the "trend" then it probably wasn't very strong anyways.

To start with i'm insisting on it based on knowledge of the system. The data is arrivals to an emergency department which should have a distinctly sinusoidal nature following the diurnal nature of humans. (I also anticipate a weekly frequency, but one thing at a time).

I know the data looks like a mess, but also based on this graph (of average number of arrivals per hour) i've attached, i thought a sinusoid reasonable.

I know the data looks like a mess, but also based on this graph (of average number of arrivals per hour) i've attached, i thought a sinusoid reasonable.

Last edited:

To start with i'm insisting on it based on knowledge of the system.

A Brownian motion is normally distributed, right? Isn't it more natural to use a discrete distribution that is non-negative like the Poisson distribution or the negative binomial?

If you like the sine-model, then start with that! But isn't it true that a model “A*sin(omega*t+phi)+C” = a + b1*sin(t) + b2*cos(t)? I am to lazy to check if it is true with that equal sign. But if that is correct, then that means that you don't need to run non-linear regression, as it can be expressed by a linear model: y = a + b1*x1 + b2*x2 + eps, where x1 = sin(t) and x2 = cos(t).

But since you like the sine model, why not do the Fourier transform? Check the FFT, fast Fourier transform. (Then you will get several sine and cosine terms that will approximate the true function. In principle the number of terms will go to infinity, but maybe two or three terms will give a good approximation.)

We have discussed “circular data” here several times but I am not sure if that would be helpful in this case.

Also with 15.000 observations it is possible to do “dummy coding” for each of the 24 hours, and even for all week, so that (7*24 -1) = 167 dummy variables (and saving one for the intercept).

There can be other seasonal factors over the year. One explanatory factor is heat waves and the presence of influenza and the air pollution level. So there are a lot of published papers for theese risk factors and “arrivals to emergency departments”.

An other common modelling strategy is to use a GAM (generalised additive model, with the mgcv package in R) for the mean intensity of the arrival rate.