How to interpret a reliability table 1 – reliability, resolution and skill

The diagonal line is the line for which the forecast probability = observed frequency and indicates the probability forecast is reliable. Points to the right (left) of that line indicate overforecasting (underforecasting). Points within the green area indicate that the forecasts have positive skill. The diagonal edge of the green area is the locus of points where the reliability component of the Brier skill score exactly matches the resolution component which means skill goes to 0. This is sometimes called the "no skill line". To understand this, recall the decomposition of the Brier score, and the definition of the Brier skill score from the previous unit,

Reliability is the mean squared difference between the forecast probability and the observed frequency, weighted according to the number of cases in each bin. Graphically, that is the mean squared vertical distance from the points on the reliability curve to the diagonal line,

Resolution is the mean squared difference of the observed frequencies with respect to the sample mean frequency (base rate). Graphically, this is the mean of the squared distances from the reliability curve to the horizontal line representing the base rate,

Along the no skill line, the line which bisects the angle between the diagonal and the horizontal base rate line, the resolution and reliability component are numerically equal and cancel out in the BSS, leading to a skill score of 0.

Now, extend your ability to interpret reliability tables by completing the following exercises:

1. Below are some reliability tables. Choose the best interpretation of each and drag it over the table.

Correct. The reliability line is above the diagonal for all bins.

Correct. This form is often seen: probabilities below (above) the climatological frequency are under- (over-) forecast and the curve begins to turn toward the horizontal.

Correct. No matter which probability is forecast, the observed frequency of occurrence is approximately the same.

Correct. The reliability curve lies along the no skill line.

Correct. Especially the higher probability bins likely have few cases in them. Aside from obtaining a bigger sample, one can make the upper bins wider to increase the sample size in each. In any case it would be wise to include error bars on the points of a reliability table when the sample is relatively small.

Correct. Observed frequencies cover the whole range from 0 to 1, even though the forecast probability range is restricted. The over-resolution results in a loss of reliability. This type of curve is not often seen.

Correct. To be perfect, the forecast must be both perfectly reliable and perfectly resolved, which can only happen if it is categorical and correct.

Correct. The base rate is low indicating an uncommon event. The reliability curve terminates around 0.6, indicating that the forecaster has not attempted to forecast high probabilities of such an uncommon event, probably wisely.

Correct. For categorical forecasts, the only forecast probabilities are 0 and 1. In this case, the event happens more often when it is forecast to occur than when it isn’t, which is good. However, categorical forecasts are never reliable unless they are perfect.

incorrect - try again

Please put the labels on one of the existing boxes.

You have completed the exercise

How to interpret a reliability table 1-reliability, resolution and skill