Title: | The Calibration Simplex |
---|---|
Description: | Generates the calibration simplex (a generalization of the reliability diagram) for three-category probability forecasts, as proposed by Wilks (2013) <doi:10.1175/WAF-D-13-00027.1>. |
Authors: | Johannes Resin [aut, cre]
|
Maintainer: | Johannes Resin <[email protected]> |
License: | GPL-2 |
Version: | 0.5.4 |
Built: | 2025-02-08 06:23:35 UTC |
Source: | https://github.com/cran/CalSim |
Generates an object of class calibration_simplex
which can be used to assess the calibration
of ternary probability forecasts. The Calibration Simplex can be seen as generalization of the reliability diagram
for binary probability forecasts. For details on the interpretation of the calibration simplex, see Wilks (2013). Be
aware that some minor changes have been made compared to the calibration simplex as suggested by Wilks (2013) (see note below).
As a somewhat experimental feature, multinomial p-values can be used for uncertainty quantification, that is, as a tool to judge whether the observed discrepancies may be merely coincidental or whether the predictions may in fact be miscalibrated, see Resin (2023, Section 4.2).
calibration_simplex(n, p1, p2, p3, obs, test_stat, percentagewise) ## Default S3 method: calibration_simplex( n = 10, p1 = NULL, p2 = NULL, p3 = NULL, obs = NULL, test_stat = "LLR", percentagewise = FALSE )
calibration_simplex(n, p1, p2, p3, obs, test_stat, percentagewise) ## Default S3 method: calibration_simplex( n = 10, p1 = NULL, p2 = NULL, p3 = NULL, obs = NULL, test_stat = "LLR", percentagewise = FALSE )
n |
A natural number. |
p1 |
A vector containing the forecasted probabilities for the first (1) category, e.g. below-normal. |
p2 |
A vector containing the forecasted probabilities for the second (2) category, e.g. near-normal. |
p3 |
A vector containing the forecasted probabilities for the third (3) category, e.g. above-normal. |
obs |
A vector containing the observed outcomes (Categories are encoded as 1 (e.g. below-normal), 2 (e.g. near-normal) and 3 (e.g. above-normal)). |
test_stat |
A string indicating which test statistic is to be used for the multinomial test in each bin. Options are "LLR" (log-likelihood ratio; default), "Chisq" (Pearson's chi-square) and "Prob" (probability mass statistic). See details |
percentagewise |
Logical, specifying whether probabilities are percentagewise (summing to 100) or not (summing to 1). |
Only two of the three forecast probability vectors (p1
, p2
and p3
) need to be specified.
The p-values are based on multinomial tests comparing the observed frequencies within a bin
with the average forecast probabilities within the bin as outlined in Resin (2023, Section 4.2).
The p-values are exact and do not rely on asymptotics, however, it is assumed that the true
distribution (under the hypothesis of forecast calibration) within each bin
is approximated well by the multinomial distribution. If n
is small the
approximation may be poor, resulting in unreliable p-values. p-Values less than 0.0001 are not
exact but merely indicate a value less than 0.0001.
A list with class "calibration_simplex" containing
n |
As input by user or default. |
n_bins |
Computed from |
n_obs |
Total number of observations. |
freq |
Vector of length |
cond_rel_freq |
Matrix containing the observed outcome frequencies within each bin. |
cond_ave_prob |
Matrix containing the average forecast probabilities within each bin. |
pvals |
Exact multinomial p-values within each bin. See details. |
Object of class calibration_simplex
.
In contrast to the calibration simplex proposed by Daniel S. Wilks, 2013, the simplex has been
mirrored at the diagonal through the left bottom hexagon. The miscalibration error is by default calculated
precisely (in each bin as the difference of the relative frequencies of each class and the
average forecast probabilities) instead of approximately (using Wilks original formula).
Approximate errors can be used by setting true_error = FALSE
when using plot.calibration_simplex
.
Wilks, D. S. (2013). The Calibration Simplex: A Generalization of the Reliability Diagram for Three-Category Probability Forecasts. Weather and Forecasting, 28, 1210-1218.
Resin, J. (2023). A Simple Algorithm for Exact Multinomial Tests. Journal of Computational and Graphical Statistics 32, 539-550.
attach(ternary_forecast_example) #see also documentation of sample data #?ternary_forecast_example # Calibrated forecast sample calsim0 = calibration_simplex(p1 = p1, p3 = p3, obs = obs0) plot(calsim0,use_pvals = TRUE) # with multinomial p-values # Overconfident forecast sample calsim1 = calibration_simplex(p1 = p1, p3 = p3, obs = obs1) plot(calsim1) # Underconfident forecast sample calsim2 = calibration_simplex(p1 = p1, p3 = p3, obs = obs2) plot(calsim2,use_pvals = TRUE) # with multinomial p-values # Unconditionally biased forecast sample calsim3 = calibration_simplex(p1 = p1, p3 = p3, obs = obs3) plot(calsim3) # Using a different number of bins calsim = calibration_simplex(n=4, p1 = p1, p3 = p3, obs = obs3) plot(calsim) calsim = calibration_simplex(n=13, p1 = p1, p3 = p3, obs = obs3) plot(calsim, # using some additional plotting parameters: error_scale = 0.5, # errors are less pronounced (smaller shifts) min_bin_freq = 100, # dots are plotted only for bins, # which contain at least 100 forecast-outcome pairs category_labels = c("below-normal","near-normal","above-normal"), main = "Sample calibration simplex") detach(ternary_forecast_example)
attach(ternary_forecast_example) #see also documentation of sample data #?ternary_forecast_example # Calibrated forecast sample calsim0 = calibration_simplex(p1 = p1, p3 = p3, obs = obs0) plot(calsim0,use_pvals = TRUE) # with multinomial p-values # Overconfident forecast sample calsim1 = calibration_simplex(p1 = p1, p3 = p3, obs = obs1) plot(calsim1) # Underconfident forecast sample calsim2 = calibration_simplex(p1 = p1, p3 = p3, obs = obs2) plot(calsim2,use_pvals = TRUE) # with multinomial p-values # Unconditionally biased forecast sample calsim3 = calibration_simplex(p1 = p1, p3 = p3, obs = obs3) plot(calsim3) # Using a different number of bins calsim = calibration_simplex(n=4, p1 = p1, p3 = p3, obs = obs3) plot(calsim) calsim = calibration_simplex(n=13, p1 = p1, p3 = p3, obs = obs3) plot(calsim, # using some additional plotting parameters: error_scale = 0.5, # errors are less pronounced (smaller shifts) min_bin_freq = 100, # dots are plotted only for bins, # which contain at least 100 forecast-outcome pairs category_labels = c("below-normal","near-normal","above-normal"), main = "Sample calibration simplex") detach(ternary_forecast_example)
Plot Calibration Simplex
## S3 method for class 'calibration_simplex' plot( x, true_error = TRUE, error_scale = 0.3, min_bin_freq = 10, plot_error_scale = TRUE, scale_area = NULL, indicate_bins = TRUE, category_labels = c("1", "2", "3"), use_pvals = FALSE, alphas = c(0.1, 0.01), colors = c("blue", "orange", "red", "black"), ... )
## S3 method for class 'calibration_simplex' plot( x, true_error = TRUE, error_scale = 0.3, min_bin_freq = 10, plot_error_scale = TRUE, scale_area = NULL, indicate_bins = TRUE, category_labels = c("1", "2", "3"), use_pvals = FALSE, alphas = c(0.1, 0.01), colors = c("blue", "orange", "red", "black"), ... )
x |
Object of class |
true_error |
Logical, specifying whether to use true miscalibration errors or approximate miscalibration errors. |
error_scale |
A number specifying the magnitude of the miscalibration errors (greater 0, usually should be less than 1, cf. note below). |
min_bin_freq |
A number. Lower bound for (absolute) frequencies, i.e. how many observations have to lie in a bin for it to be plotted. |
plot_error_scale |
Logical, specifying whether to plot a scale showing the magnitude of miscalibration errors. |
scale_area |
Optional. A number by which the areas of the points are scaled. Use if points are to small or to big. |
indicate_bins |
Logical, specifying whether to connect points to their respective bin (center of hexagon). |
category_labels |
A vector of length 3 containing the category names, e.g. |
use_pvals |
Logical, determines whether multinomial p-values are used for uncertainty quantification, see details. |
alphas |
Vector of length 2 with values 1 > |
colors |
Vector of length 4 specifying colors, defaults to |
... |
Arguments concerning the title (e.g. |
If multinomial p-values are used (use_pvals = TRUE
), the dots are colored in the following way:
colors[1]
(blue by default): p-value greater alphas[1]
(0.1 by default).
colors[2]
(orange by default): p-value between alphas[1]
and alphas[2]
(0.1 and 0.01 by default)
colors[3]
(red by default): p-value less than alphas[2]
(0.01 by default)
colors[4]
(black by default): p-value is exactly 0. This only happens if a category which is assigned 0 probability realizes.
Many small p-values (orange and red dots) indicate miscalibrated predictions, whereas many blue dots indicate that the predictions
may in fact be calibrated. WARNING: The use of the multinomial p-values is more of an experimental feature and may not yield reliable
p-values, especially if n
is small.
For details regarding the calculation of the p-values see also calibration_simplex
.
For details on the meaning of the error scale, cf. Wilks, 2013, especially Fig. 2. Note that the miscalibration error in each category is in "probability units" (as it is the average difference in relative frequency and forecast probability in each bin).
10,000 realizations of a ternary probability forecast, which exhibits different characteristics, depending on the realizing outcome variable. Idealized forecast example, generated as described in Wilks (2013).
data(ternary_forecast_example)
data(ternary_forecast_example)
A data frame with 10,000 rows and 6 variables.
forecast probability for outcome 1
forecast probability for outcome 3
outcomes, such that the forecast is well-calibrated
outcomes, such that the forecast is overconfident
outcomes, such that the forecast is underconfident
outcomes, such that the forecast is unconditionally biased
Data generated by package author.
Daniel S. Wilks, 2013, The Calibration Simplex: A Generalization of the Reliability Diagram for Three-Category Probability Forecasts, Weather and Forecasting, 28, 1210-1218