[ANN] ConformalPrediction.jl: Uncertainty quantification through conformal prediction for machine learning models trained in MLJ

Very excited to announce ConformalPrediction.jl :tada:

ConformalPrediction

Stable Dev Build Status Coverage Code Style: Blue ColPrac: Contributorโ€™s Guide on Collaborative Practices for Community Packages Twitter Badge

ConformalPrediction.jl is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ Blaom et al. (2020). Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.

:open_book: Background

Conformal Prediction is a scalable frequentist approach to uncertainty quantification and coverage control. It promises to be an easy-to-understand, distribution-free and model-agnostic way to generate statistically rigorous uncertainty estimates. Interestingly, it can even be used to complement Bayesian methods.

The animation below is lifted from a small blog post that introduces the topic and the package ([TDS], [Quarto]). It shows conformal prediction sets for two different samples and changing coverage rates. Standard conformal classifiers produce set-valued predictions: for ambiguous samples these sets are typically large (for high coverage) or empty (for low coverage).

Conformal Prediction in action: Prediction sets for two different samples and changing coverage rates. As coverage grows, so does the size of the prediction sets.

:triangular_flag_on_post: Installation

You can install the latest stable release from the general registry:

using Pkg
Pkg.add("ConformalPrediction")

The development version can be installed as follows:

using Pkg
Pkg.add(url="https://github.com/pat-alt/ConformalPrediction.jl")

:repeat: Status

This package is in its early stages of development and therefore still subject to changes to the core architecture and API. The following CP approaches have been implemented in the development version:

Regression:

  • Inductive
  • Naive Transductive
  • Jackknife
  • Jackknife+
  • Jackknife-minmax
  • CV+
  • CV-minmax

Classification:

  • Inductive (LABEL (Sadinle, Lei, and Wasserman 2019))
  • Adaptive Inductive

The package has been tested for the following supervised models offered by MLJ.

Regression:

using ConformalPrediction
keys(tested_atomic_models[:regression])
KeySet for a Dict{Symbol, Expr} with 4 entries. Keys:
  :nearest_neighbor
  :evo_tree
  :light_gbm
  :decision_tree

Classification:

keys(tested_atomic_models[:classification])
KeySet for a Dict{Symbol, Expr} with 4 entries. Keys:
  :nearest_neighbor
  :evo_tree
  :light_gbm
  :decision_tree

:mag: Usage Example

To illustrate the intended use of the package, letโ€™s have a quick look at a simple regression problem. Using MLJ we first generate some synthetic data and then determine indices for our training, calibration and test data:

using MLJ
X, y = MLJ.make_regression(1000, 2)
train, test = partition(eachindex(y), 0.4, 0.4)

We then import a decision tree (EvoTrees.jl) following the standard MLJ procedure.

EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees
model = EvoTreeRegressor() 

To turn our conventional model into a conformal model, we just need to declare it as such by using conformal_model wrapper function. The generated conformal model instance can wrapped in data to create a machine. Finally, we proceed by fitting the machine on training data using the generic fit! method:

using ConformalPrediction
conf_model = conformal_model(model)
mach = machine(conf_model, X, y)
fit!(mach, rows=train)

Predictions can then be computed using the generic predict method. The code below produces predictions for the first n samples. Each tuple contains the lower and upper bound for the prediction interval.

n = 10
Xtest = selectrows(X, first(test,n))
ytest = y[first(test,n)]
predict(mach, Xtest)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                                                                โ”‚
โ”‚       (1)   ([-1.0398755385842378], [1.174649631946424])       โ”‚
โ”‚       (2)   ([-0.8812446360021866], [1.333280534528475])       โ”‚
โ”‚       (3)   ([-1.0186882105711579], [1.1958369599595038])      โ”‚
โ”‚       (4)   ([-1.8854818442600265], [0.32904332627063515])     โ”‚
โ”‚       (5)   ([-1.5473925987675485], [0.6671325717631131])      โ”‚
โ”‚       (6)   ([-1.7896211025024724], [0.42490406802818925])     โ”‚
โ”‚       (7)   ([-1.9246506093872306], [0.289874561143431])       โ”‚
โ”‚       (8)   ([-0.9791712385383624], [1.2353539319922993])      โ”‚
โ”‚       (9)   ([-1.7526388729209201], [0.4618862976097414])      โ”‚
โ”‚      (10)   ([-0.5015897849914924], [1.7129353855391694])      โ”‚
โ”‚                                                                โ”‚
โ”‚                                                                โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 10 items โ”€โ”€โ”€โ•ฏ

:hammer_and_wrench: Contribute

Contributions are welcome! Please follow the SciML ColPrac guide.

:mortar_board: References

Blaom, Anthony D., Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, and Sebastian J. Vollmer. 2020. โ€œMLJ: A Julia Package for Composable Machine Learning.โ€ Journal of Open Source Software 5 (55): 2704. https://doi.org/10.21105/joss.02704.

Sadinle, Mauricio, Jing Lei, and Larry Wasserman. 2019. โ€œLeast Ambiguous Set-Valued Classifiers with Bounded Error Levels.โ€ Journal of the American Statistical Association 114 (525): 223โ€“34.

6 Likes

For anyone interested in contributing to this, please see also here