Some suggest you chose your ESM model, e.g., Holt Winters, based on whether the data looks like it will support a model. For instance if it has a trend and no seasonality you chose Holt, if both Holt Winters. Others suggest you use a hold out sample to see which of these models best predicts a hold out sample [say the last year]. With my data these ways of chosing models often leads one to chose different models. That is the data has a trend and seasonality, but simple exponential smoothing actually does a better job of predicting the hold out data based on MAPE.

So which is the better approach? Using a hold out sample or looking at the past nature of the data. ]]>

We need some help for a Poisson Regression: Our problem is to determine if there is a relationship between the return on equity of firms (ROE) and the presence (numbers of indicators used) (Y) of a specific type of indicator show in the annual report. At the beginning we had 118 observations and after some adjustments we had 104 firms. In order to verifying if there is a linear relationship between Y (poisson distributed) and the predictor ROE we built 8 clusters basing on ROE’s distribution and we computed for each class the mean of ROE (ROEm), the mean of numbers of indicators used (ym) and, finally, the total amount of indicators used by each class. Then we plotted in a graph the log_ym and ROEm, their relationship was not linear but quadratic (Figure1).

Thanks to this consideration we decided to apply a poisson regression on the entire dataset using as the dependent variable Y and as predictor ROE and ROE^2.

This model was not so good: pseudo R^2 was very small (about 0,02) and the hypothesis of over-dispersion was accepted (Figure2). As we red in some papers, we concluded that there is a tendency for observations to cluster. At this point we decided to apply the poisson regression on clusterized data using as dependent variable the total amount of indicators used by each class (ym is not discrete) and as predictor ROEm and ROEm^2. As the offset variable we used the number of cases per each cluster.

The output of the model is summarized in Figure 3

Is our approach correct?

Are there simpler methods?

Are 8 clusters few? Results get worst if we use for example 14 clusters instead of 8; Does it mean that the model doesn’t fit well?

Thank you in advance!:wave::wave: