You will likely be looking at time series analysis, which I have no experienced using.
Do you have any potential predictors that may help explain the above fluctuations?
Hello everyone,
I need some advice about how to predict the number of messages to be posted in a forum based on the number of messages previously posted on the same forum every month the 10 previous years. You can find below a graph representing the number of messages previously posted :
What I would like is to identify a "trend" that would allow me to tell how many messages will be posted in the following months (obviously, I would do that on newer data). I was thinking about doing some sort of regression but my skills in statistics are not that great. What do you think?
Thank you!
You will likely be looking at time series analysis, which I have no experienced using.
Do you have any potential predictors that may help explain the above fluctuations?
Stop cowardice, ban guns!
Ok, thank you, I had never heard of time series analysis, I will check that. And I don't have any predictors. Maybe the month of the year has an influence on the number of messages posted, but I don't know for now.
For those who may be interested, I think I figured out how to do. I used ARIMA. I don't know if that is correct -- please tell me if it is not! -- but this is what I did:
- I entered my data in R as a time series
- I used the auto.arima() function (in the forecast library) to find the optimal parameters for the ARIMA model
- I generated the model
- I applied the forecast function to it
Here is the code:
And here is the result:Code:> vec = c(14,14,6,6,18,6,3,11,19,40,25,24,35,18,37,51,55,74,39,50,75,127,116,239,125,174,249,295,473,435,408,834,870,1357,1684,4424,5559,5844,8167,16253,21481,21107,21977,29219,30942,35164,39167,37134,39841,42546,42088,45719,43197,49463,53292,71794,69769,81344,72821,76963,79017,78711,79111,78277,82376,81930,82682,109876,116350,143995,158316,185915,169616,163694,156993,174117,179635,203711,183714,226372,231351,264537,229456,211828,234205,188645,202730,202623,211995,228025,208926,246247,207021,204611,204082,190179,180224,160862,157919,170342,171995,160736,145481,168716,159044,159673,158128,154751,139266,139699,144129,155927,160554,173098,172859,195933,170872,192772,163560,154206,133142,130577,137756,129598,140794,133556,138111,151234,127174,162714,144866,127285,124480,132021,139130,112157,121962,106806,109051,121430,110299,114485,105224,103578,88850,93361,88971,78878,82327,85312,67054,78447,74901,74718,64201,64709,55960,53437,50360,49388,50932,46196,48206,55148,53573,49467,41345,46502,38052,31148,30166,31662,36370,39369,36745,35372,33431,36656) > myts <- ts(vec, start=c(2000, 06), end=c(2015, 03), frequency=12) > myts Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2000 14 14 6 6 18 6 3 2001 11 19 40 25 24 35 18 37 51 55 74 39 2002 50 75 127 116 239 125 174 249 295 473 435 408 2003 834 870 1357 1684 4424 5559 5844 8167 16253 21481 21107 21977 2004 29219 30942 35164 39167 37134 39841 42546 42088 45719 43197 49463 53292 2005 71794 69769 81344 72821 76963 79017 78711 79111 78277 82376 81930 82682 2006 109876 116350 143995 158316 185915 169616 163694 156993 174117 179635 203711 183714 2007 226372 231351 264537 229456 211828 234205 188645 202730 202623 211995 228025 208926 2008 246247 207021 204611 204082 190179 180224 160862 157919 170342 171995 160736 145481 2009 168716 159044 159673 158128 154751 139266 139699 144129 155927 160554 173098 172859 2010 195933 170872 192772 163560 154206 133142 130577 137756 129598 140794 133556 138111 2011 151234 127174 162714 144866 127285 124480 132021 139130 112157 121962 106806 109051 2012 121430 110299 114485 105224 103578 88850 93361 88971 78878 82327 85312 67054 2013 78447 74901 74718 64201 64709 55960 53437 50360 49388 50932 46196 48206 2014 55148 53573 49467 41345 46502 38052 31148 30166 31662 36370 39369 36745 2015 35372 33431 36656 > library(forecast) > auto.arima(myts) Series: myts ARIMA(4,2,0)(0,0,2)[12] Coefficients: ar1 ar2 ar3 ar4 sma1 sma2 -1.0746 -0.7222 -0.4778 -0.2572 0.2820 0.1347 s.e. 0.0731 0.1029 0.1039 0.0727 0.0764 0.0680 sigma^2 estimated as 138366184: log likelihood=-1897.57 AIC=3809.13 AICc=3809.8 BIC=3831.33 > fit <- Arima(myts,seasonal=c(4,2,0),order=c(0,0,2)) > plot(forecast(fit)
Once again, I don't know if that is correct, but the result look pretty good.
Tweet |