We need some help for a Poisson Regression: Our problem is to determine if there is a relationship between the return on equity of firms (ROE) and the presence (numbers of indicators used) (Y) of a specific type of indicator show in the annual report. At the beginning we had 118 observations and after some adjustments we had 104 firms. In order to verifying if there is a linear relationship between Y (poisson distributed) and the predictor ROE we built 8 clusters basing on ROE’s distribution and we computed for each class the mean of ROE (ROEm), the mean of numbers of indicators used (ym) and, finally, the total amount of indicators used by each class. Then we plotted in a graph the log_ym and ROEm, their relationship was not linear but quadratic (Figure1).

Thanks to this consideration we decided to apply a poisson regression on the entire dataset using as the dependent variable Y and as predictor ROE and ROE^2.

This model was not so good: pseudo R^2 was very small (about 0,02) and the hypothesis of over-dispersion was accepted (Figure2). As we red in some papers, we concluded that there is a tendency for observations to cluster. At this point we decided to apply the poisson regression on clusterized data using as dependent variable the total amount of indicators used by each class (ym is not discrete) and as predictor ROEm and ROEm^2. As the offset variable we used the number of cases per each cluster.

The output of the model is summarized in Figure 3

Is our approach correct?

Are there simpler methods?

Are 8 clusters few? Results get worst if we use for example 14 clusters instead of 8; Does it mean that the model doesn’t fit well?

Thank you in advance!:wave::wave: