thread can be deleted. i made a subtle mistake in my calculations...
I have a set of 234 predictions of tennis match outcomes and 5 different prediction models. I use the two different methods for calculating the Brier score described, for instance, here.
The first method is:
Where is the number of forecasting instances, is the forecast probability of the -th instance, and is the outcome (either or ).
The second method decomposes the brier score into *Resolution*, *Reliability*, and *Uncertainty*:
With being the total number of forecasts issued, the number of unique forecasts issued, the observed base rate for the event to occur, the number of forecasts with the same probability category and the observed frequency, given forecasts of probability .
For more details see Wikipedia.
So for my calculations I would expect that both methods yield the same results. However, I can't achieve this with my data, hence, there must be an error. I provide a minimal working example with my data (google spreadsheet) where I show how I calculate the brier score in both ways.
Here is the link to the minimal "working" example: https://goo.gl/uwUoLU
(nb: the *Resolution* in my example is 0, because in this example the data set is not split into bins, hence the *bin base rate* (aka *observed frequency*) equals the *overall observed base rate*)
I would greatly appreciate, if
- you could point me to errors in my calculation
- explain, what I am doing wrong
- provide the correct calculations for both ways of calculating the Brier score
I already spent a few hours on getting this right, but did not succeed. As pointed out above, I think I probably have a slight error when applying the decomposed brier score formula (second method). However, I could not identify, what I am doing wrong exactly. One thing I noticed is, that the second formula speaks about *total number of forecasts* and *number of unique of forecasts*. Since my example (for simplicity) only uses 1 bin, I wonder if *total number of forecasts* and *number of unique of forecasts* are the same, or do they mean something different?
thread can be deleted. i made a subtle mistake in my calculations...
Tweet |