arima and regression forecast

#1
Dear All,

Below data is T-shirt sale from week 14 to 35 and year 2011 and 2012. I used dummys to indicate special day sales. Week 25 is father's day and week 35 in 2011 and week 34 in 2012 is special day event sales(this moves 1 week backward everyear). I used
2 models . One is auto.arima and the other is only regression . I also printed my question with my codes.
PHP:
ff=read.table(header=TRUE, text="
RN store   year     month   week    salesunit   avgStock    avgprice dummys
1   11     2011       4      14       256        1016.86     50.26    0
2   11     2011       4      15       403        1327.71     49.02    0
3   11     2011       4      16       337        1682.29     50.90    0
4   11     2011       4      17       386        2064.86     50.91    0
4   11     2011       4      18       428        2235.43     48.80    0
5   11     2011       5      19       469        2086.57     50.63    0
6   11     2011       5      20       531        2170.14     50.34    0
7   11     2011       5      21       614        1941.14     51.76    0
8   11     2011       5      22       459        2099.29     51.48    0
9   11     2011       6      23       583        2739.57     50.57    0
10  11     2011       6      24       776        3255.86     49.92    0
11  11     2011       6      25       947        3656.86     53.83    1
12  11     2011       6      26       495        4368.71     50.68    0
13  11     2011       7      27       666        4558.43     44.18    0 
14  11     2011       7      28       780        3922.00     41.36    0
15  11     2011       7      29       721        3997.86     42.24    0
16  11     2011       7      30       762        3895.71     41.98    0
17  11     2011       7      31       745        3661.29     41.12    0
18  11     2011       8      32       617        3152.86     42.10    0
19  11     2011       8      33       551        2563.86     40.99    0
20  11     2011       8      34       599        2035.14     36.76    0
21  11     2011       8      35       717        1734.29     37.41    1
22  11     2012       4      14       208         581.86     55.95    0
23  11     2012       4      15       287        1013.71     56.05    0
24  11     2012       4      16       230        1806.14     54.93    0
25  11     2012       4      17       370        2250.57     53.53    0
26  11     2012       4      18       576        2287.00     52.53    0
27  11     2012       5      19       652        2498.71     53.57    0
28  11     2012       5      20       611        2533.57     53.81    0
29  11     2012       5      21       468        2486.00     54.53    0
30  11     2012       5      22       385        2726.00     56.85    0
31  11     2012       6      23       366        2961.43     58.21    0
32  11     2012       6      24       509        2960.29     53.12    0
33  11     2012       6      25       875        2742.57     51.00    1
34  11     2012       6      26       464        3210.86     52.18    0
35  11     2012       7      27       534        3586.14     51.03    0
36  11     2012       7      28       588        3627.43     48.75    0
37  11     2012       7      29       628        3493.00     44.54    0
38  11     2012       7      30       636        3335.00     41.46    0
39  11     2012       7      31       533        3138.57     43.23    0
40  11     2012       8      32       689        2910.57     41.23    0
41  11     2012       8      33       775        2698.14     41.61    0
42  11     2012       8      34       946        2344.14     41.09    1
")

storets=ts(ff[,6])
lts=length(storets)
test=window(storets,end=lts-3.1) #I want to hold out last 4 weeks for validating model
check=window(storets,start=lts-3)
dreg=data.frame(ff,model.matrix(~as.factor(ff$month))[,2:5])
xregmodel=dreg[-((lts-3):lts),7:13] #keep obs 1-39 to model
xregforecast=dreg[(lts-3):lts,7:13] #keep 40-43 for forecasting

modelar=auto.arima(test,xreg=xregmodel,lambda=0) #auto-arima with regression
#the result from auto arima says:ARIMA(0,0,0) with non-zero mean so does it mean useless to use arima 
#only use regression ?

armaf=forecast(modelar,h=4,xreg=xregforecast)
accuracy(armaf,check) #mape is %27 is there a way to increase that with arima model ?

#I only want to use regression maybe with lag or diff variables:
install.packages("dynlm")
library("dynlm")
modelr=dynlm(salesunit[1:39] ~ avgStock[1:39]+avgprice[1:39]+dummys[1:39]+L(salesunit[2:40]),data=ff)
summary(modelr) 
summary(modelr)$coefficients
summary(modelr)$r.squared
#Questions:
#1.How am I going to predict under modelr and measure mape?
#2.modelr is the same as below model? (Since I regress only 1)  :
modelar1=Arima(test,order=c(1,0,0),xreg=xregmodel,lambda=0)
armaf1=forecast(modelar1,h=4,xreg=xregforecast)
accuracy(armaf1,check)  
#Anybody has any suggestion to use different model combinations?
Thank you.
 
Last edited:
#2
You're facing the same problem that I am. This is count data and is prone to dispersionary effects. You need to somehow make this modeled against a poisson or nb process(provided my intuition is correct)
 
#4
HTML:
This is count data and is prone to dispersionary effects
What do you mean?

HTML:
 a poisson or nb process
How?
the higher the count, the higher the variance

not sure, I'm searching for that myself.

With that said, you should be able to compensate for that by doing a log transformation. This would change your interpretation so that you're going by percentages.