Moving average: Smoothing until stable


I am looking at a score in discourse over equally spaced intervals. Basically this is a time series. I want to view this as a line graph using the cumulative average. HEre's the ugly code that I used in R and the plot:

dat <- data.frame(value=rnorm(1000, sd=10), time=1:1000)

len <- nrow(dat)
dat$moving_ave <- rep(NA, len)

for(i in 1:len) {
    dat$moving_ave[i] <- mean(dat$value[1:i])

ggplot(dat, aes(y=moving_ave, x = time)) + geom_smooth() + 
    geom_line(size=.8, color="black")

The problem is that the average doesn't become stable (rnorm should give a rough line around 0 but the wide stan dev. causes the beginning of the graph to have huge jumps; i.e., the mean is grossly effected by the small n). This is not a new problem and I see there's weighted means to deal with this problem:

However, even with these weighted means the mean is not stable and is only slightly improved. So I tried the TTR without really digging into time series (not my area and I realize this is huge so I need direction of where to go). I played with the following tries at weighting:

dat$SMA <- SMA(dat$value)
dat$EMA <- EMA(dat$value)
dat$WMA <- WMA(dat$value)
dat$DEMA <- DEMA(dat$value)
dat$ZLEMA <- ZLEMA(dat$value)

mdat <- melt(dat, id=c('value', 'time', 'moving_ave'), variable="type")
ggplot(mdat, aes(y=value, x = time, group=type, color=type)) + 
    geom_smooth() + 
    geom_line(size=.5) +
    geom_line(size=.5, aes(y=moving_ave), color="black", shape=2) +

This seemed to make it worse. Help lead me to smooth this front end of the data in a reasonable way.


Ambassador to the humans
I don't understand what your issue is.

For loops are icky - here is a nicer way to get the rolling mean.
cummean <- function(x){cumsum(x)/seq_along(x)}


My problem is that the moving average is less stable in the beginning. How can I make it smoother?

PS thanks on the non-for loop


Ambassador to the humans
What is your goal? I mean the rolling average is of course going to be 'less stable' (higher variance) at the beginning - there aren't as many data points to go into the average. I don't see that there is anything we can do unless you allow yourself to use future information or prior information as well.


OK I think that's the info I want. Thanks Dason. I was thinking there was a way to borrow from later data (influence the beginning) until it becomes stable.


Ambassador to the humans
Well if you can borrow from later data and your model says that all the data is independent, normals with the same mean and variance... then your best 'smooth' is just draw a line at the sample mean. Doesn't get any more smooth than that.


True. Just for reference here was what I am doing:

Here's the example from the qdap vignette:

poldat4 <- with(rajSPLIT, polarity(dialogue, act, constrain = TRUE))

polcount <- counts(poldat4)$polarity
len <- length(polcount)

cumpolarity_mean <- cumpolarity_median <- rep(NA, len)

cummean <- function(x){cumsum(x)/seq_along(x)}

cumpolarity <- data.frame(cum_mean = cummean(polcount), Time=1:len)

## Calculate background rectangles
ends <- cumsum(rle(counts(poldat4)$act)$lengths)
starts <- c(1, head(ends + 1, -1))
rects <- data.frame(xstart = starts, xend = ends, 
    Act = c("I", "II", "III", "IV", "V"))

ggplot() + theme_bw() +
    geom_rect(data = rects, aes(xmin = xstart, xmax = xend, 
        ymin = -Inf, ymax = Inf, fill = Act), alpha = 0.2) +
    geom_smooth(data = cumpolarity, aes(y=cum_mean, x = Time)) +
    geom_hline(y=mean(polcount), color="grey30", size=1, alpha=.3, linetype=2) + 
    annotate("text", x = mean(ends[1:2]), y = mean(polcount), color="grey30", 
        label = "Average Polarity", vjust = .3, size=3) +
    geom_line(data = cumpolarity, aes(y=cum_mean, x = Time), size=1) +
    ylab("Cumulative Average Polarity") + xlab("Duration") +
    scale_x_continuous(expand = c(0,0)) +
    geom_text(data=rects, aes(x=(xstart + xend)/2, y=-.04, 
        label=paste("Act", Act)), size=3) + 
    guides(fill=FALSE) +