Note: this is updated code (corrects bugs in previous versions)
Recently, there has been a rise in interest in “prior-free posterior distributions”.
There is a related neat new package ConformalPrediction.jl (by @pat-alt)
This package gives prediction intervals for supervised ML models.
Using training data D_{n} \equiv \{ (X_i, Y_i)\}_{i=1}^{n}, fit a model Y_{i} = f(X_{i}).
Next, given a new X_{n+1}, a prediction interval is a set-valued function
C_{1-\alpha}= C(D_{n}, X_{n+1}, \alpha) = \left[ L_{1-\alpha}, U_{1-\alpha}\right]
such that P( Y_{n+1} \in C_{1-\alpha}) \geq 1-\alpha for miscoverage \alpha \in [0,1].
My goal is to recover the conditional CDF \hat{F}(Y_{n+1}|X).
My first guess.
Suppose we have a very simple stylized environment. W/ symmetric prediction intervals, w/ perfect coverage, centered at the median etc.
If P( Y_{n+1} \in C_{1-\alpha}) = 1-\alpha and C_{1-\alpha}= \left[ L_{1-\alpha}, U_{1-\alpha}\right]
then P( Y_{n+1} \leq L_{1-\alpha}) = \alpha/2 and P( Y_{n+1} \leq U_{1-\alpha}) = 1- \alpha/2,
then for quantile q\in [0,0.5], \hat{F}^{-1}(q)=L_{1-2q}, and for quantile q\in [0.5,1], \hat{F}^{-1}(q)=U_{2q-1}.
Does this make sense?
It’s possible to imagine cases where this wouldn’t work, for example C_{90\%}=[L_{90\%},U_{90\%}] where P( Y_{n+1} \leq L_{90\%}) =2\% and P( Y_{n+1} \leq U_{90\%}) = 92\%.
However we still have P(Y_{n+1}\in C_{90\%}) = 90\%.
Here is an illustration:
using MLJ, Distributions, ConformalPrediction, Plots, Random, Tables;
LinearRegressor =  @load LinearRegressor  pkg=MLJLinearModels
################################################################################
# Make Data  #
################################################################################
n= 10_000; σ=1.0;
X = [ones(n) sin.(1:n) cos.(n:-1:1) exp.(sin.(1:n))]
θ = [9.0; 7.0; 3.0; 21]
Noise = randn(MersenneTwister(49), n);
cef(x;θ=θ)  = x*θ;      # Conditional Expectation Function 
skedastic(x;σ=σ) = σ;   # Skedastic Function
y = cef(X;θ=θ) + skedastic(X;σ=σ)*Noise
train, calibration, test = partition(eachindex(y), 0.4, 0.4, shuffle=true, rng=444)
################################################################################
# Fit model on training data  #
################################################################################
model = LinearRegressor(fit_intercept = false) 
mach = machine(model, Tables.table(X), y)
fit!(mach, rows=train)
pr_y = predict(mach)
### make sure predictions unbiased outside the training sample.
[mean(pr_y[jj] - y[jj]) for jj in [:, train, calibration, test] ]
# plot(pr_y - y) # plot(pr_y[test] - y[test]) # plot(pr_y[train] - y[train])
################################################################################
# Calibrate on calibration data  #
# Compute prediction intervals on test data #
################################################################################
conf_mach = conformal_machine(mach)
calibrate!(conf_mach, selectrows(X, calibration), y[calibration])
# 1: compare simulated coverage w/ theoretical coverage
grid_coverage = 0.00:0.01:1.00;
sim_coverage =[];
for jj in eachindex(grid_coverage)
    PI = predict(conf_mach, X[test,:]; coverage = grid_coverage[jj] )
    PI_LB = [PI[j][1][2][] for j in 1:length(test)]
    PI_UB = [PI[j][2][2][] for j in 1:length(test)]
    m =mean(PI_LB .<= y[test] .<= PI_UB)   # % of time ynew is in PI. 
    push!(sim_coverage, m)
end
plot(legend=:topleft, xlabel = "Desired Coverage", ylabel = "Simulated Coverage",)
plot!(grid_coverage, sim_coverage, lab="Simulated Coverage")
plot!(grid_coverage, x->x, c=3, l=:dash, lab="Perfect Coverage") # , lw=3
# 2: Plot PI for coverage = 90% 
PI_90 = predict(conf_mach, X[test,:]; coverage = 0.90)
PI_90_LB = [PI_90[j][1][2][] for j in 1:length(test)]
PI_90_UB = [PI_90[j][2][2][] for j in 1:length(test)]
plot()
plot!(PI_90_LB)
plot!(PI_90_UB)
plot!(y[test])
plot!(pr_y[test])
# 3: Plot PI length for each X[test,:] for coverage = 90% 
plot( PI_90_UB .- PI_90_LB, ylims=(0,2*mean( PI_90_UB .- PI_90_LB)), lab="Length 90% CI for each X")
# 4: recover CDF F̂(Y_{n+1} | X) at one point. Compare w/ true CDF
taz = 14; xt = [X[test[taz],:] ;;]'
grid_coverage = 0.00:0.01:1.00; LB = []; UB = [];
for jj in eachindex(grid_coverage)
    PI = predict(conf_mach, xt; coverage = grid_coverage[jj] )
    push!(LB, PI[1][1][2][])
    push!(UB, PI[1][2][2][])
end
#
sim_quantiles = [sort(LB);UB]
probs         = [0.5*grid_coverage; 0.5 .+ 0.5*grid_coverage]
point_predict = pr_y[test[taz]]
CDF_true      = Normal((xt*θ)[], σ)
#
plot(legend=:topleft)
plot!(sim_quantiles, probs, lab="Empirical quantiles")
plot!([point_predict], seriestype = :vline, lab="y prediction", color="red") 
plot!(sim_quantiles, x -> cdf(CDF_true, x)  , lab="True CDF, N(Xθ,σ)",l=:dash, lw=3)
1: compare the simulated coverage w/ perfect coverage

2: plot the PI length for 90% coverage at each X in the test sample

Note: Constant length doesn’t allow for heteroskedastic PIs as in Chernozhukov et al PNAS 2021.
3: attempt to recover the CDF from the conformal PIs & compare w/ the true CDF

Here is code to compare 6 models (linear, EvoTrees, LGBM, XGB, NuSVR, EpsilonSVR):
# Fit multiple models 
mod_set =[LinearRegressor(fit_intercept = false)
    EvoTreeRegressor(nrounds = 100)
    LGBMRegressor(num_iterations = 100) 
    XGBoostRegressor(num_round = 100) 
    NuSVR()
    EpsilonSVR()
    ];
#
grid_coverage = 0.00:0.01:1.00;
#
m_pr_y = zeros(n, length(mod_set))
m_sim_coverage = zeros(length(grid_coverage),length(mod_set))
m_sim_quantiles = zeros(2*length(grid_coverage),length(mod_set)) 
m_probs = zeros(2*length(grid_coverage),length(mod_set)) 
m_point_predict = zeros(length(mod_set))
# plot(legend=:topleft)
for mm in eachindex(mod_set)
    # mm=3
    model = mod_set[mm]
    println("$model")
    #
    ### Fit on training data 
    #
    mach = machine(model, Tables.table(X), y)
    fit!(mach, rows=train)
    pr_y = predict(mach)
    m_pr_y[:,mm] = pr_y
    #
    # [mean(pr_y[jj] - y[jj]) for jj in [:, train, calibration, test] ] |> display
    # plot!(pr_y - y, lab="$model") |> display # plot(pr_y[test] - y[test]) # plot(pr_y[train] - y[train])
    #
    ### Calibrate on calibration data/Compute prediction intervals on test data
    #
    conf_mach = conformal_machine(mach)
    calibrate!(conf_mach, selectrows(X, calibration), y[calibration])
    # 1: compare simulated coverage w/ theoretical coverage
    # grid_coverage = 0.00:0.01:1.00;
    sim_coverage =[];
    for jj in eachindex(grid_coverage)
        PI = predict(conf_mach, X[test,:]; coverage = grid_coverage[jj] )
        PI_LB = [PI[j][1][2][] for j in 1:length(test)]
        PI_UB = [PI[j][2][2][] for j in 1:length(test)]
        m =mean(PI_LB .<= y[test] .<= PI_UB)   # % of time ynew is in PI. 
        push!(sim_coverage, m)
    end
    m_sim_coverage[:,mm] = sim_coverage
    #
    # 4: recover CDF F̂(Y_{n+1} | X) at one point. Compare w/ true CDF
    taz = 14; 
    # xt = [X[test[taz],:] ;;]'
    xt = selectrows(X, test[taz])
    grid_coverage = 0.00:0.01:1.00; LB = []; UB = [];
    for jj in eachindex(grid_coverage)
        PI = predict(conf_mach, xt; coverage = grid_coverage[jj] )
        push!(LB, PI[1][1][2][])
        push!(UB, PI[1][2][2][])
    end
    #
    sim_quantiles = [sort(LB);UB]
    probs         = [0.5*grid_coverage; 0.5 .+ 0.5*grid_coverage]
    point_predict = pr_y[test[taz]]
    # CDF_true      = Normal((xt*θ)[], σ)
    m_sim_quantiles[:,mm] = sim_quantiles
    m_probs[:,mm] =probs 
    m_point_predict[mm] = point_predict
end     
plot(legend=:topleft, xlabel = "Desired Coverage", ylabel = "Simulated Coverage",)
plot!(grid_coverage, x->x, c=3, l=:dash, lw=3, lab="Perfect Coverage") # 
plot!(grid_coverage, m_sim_coverage[:,1], lab="Linear")
plot!(grid_coverage, m_sim_coverage[:,2], lab="EvoTreeRegressor")
plot!(grid_coverage, m_sim_coverage[:,3], lab="LGBMRegressor")
plot!(grid_coverage, m_sim_coverage[:,4], lab="XGBoostRegressor")
plot!(grid_coverage, m_sim_coverage[:,5], lab="NuSVR")
plot!(grid_coverage, m_sim_coverage[:,6], lab="EpsilonSVR")
##########
plot(legend=:topleft, xlabel = "Desired Coverage", ylabel = "Coverage Error",)
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,1], lab="Linear")
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,2], lab="EvoTreeRegressor")
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,3], lab="LGBMRegressor")
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,4], lab="XGBoostRegressor")
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,5], lab="NuSVR")
plot!(grid_coverage, grid_coverage .- m_sim_coverage[:,6], lab="EpsilonSVR")
plot!([0.0], seriestype = :hline, l=:dash, lab="perfect", color="red") 
##########################
plot(legend=:topleft)
#
plot!(m_sim_quantiles[:,1], m_probs[:,1], lab="Linear Regression")
plot!(m_sim_quantiles[:,2], m_probs[:,2], lab="EvoTreeRegressor")
plot!(m_sim_quantiles[:,3], m_probs[:,3], lab="LGBMRegressor")
plot!(m_sim_quantiles[:,4], m_probs[:,4], lab="XGBoostRegressor")
plot!(m_sim_quantiles[:,5], m_probs[:,5], lab="NuSVR")
plot!(m_sim_quantiles[:,6], m_probs[:,6], lab="EpsilonSVR")
# plot!([m_point_predict], seriestype = :vline, lab="y point predictions", color="red") 
plot!(range(minimum(m_sim_quantiles), maximum(m_sim_quantiles), length=202), x -> cdf(CDF_true, x)  , lab="True CDF, N(Xθ,σ)",l=:dash, lw=3)
First compare coverage across models:

Or coverage error

Next compare the simulated conditional CDFs \hat{F}(Y_{n+1}|X) with the truth

Can anyone point me to theory about the distribution of the statistic \hat{F}(Y_{n+1}|X)?
Another simple example: suppose the error \varepsilon \sim U(0,1), then y= X\theta +\sigma \varepsilon \sim U(X\theta, X\theta +\sigma), hence the true conditional CDF F(y|x) = \frac{y-x\theta}{\sigma}.
Repeat code above except:
Noise = rand(MersenneTwister(49), n); # Uniform[0,1] 
CDF_true      = Uniform(μ,μ +σ)

