Plot the confidence interval for a model fit

I’m no data scientist, so it’s likely that my question does not make much sense… But here goes:

I recently tried to do some data analysis that involved fitting models. I’m able to fit the model, compute represent a prediction that it produces, but I’d like to also (compute and) represent confidence intervals for the model.

Here is my minimal (but incomplete) example:

using DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

using GLM
model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pred.y = predict(model, pred);

using Plots
plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pred.y, label="model", linewidth=3)
savefig("/tmp/plot.png")

plot

To rephrase (because I’m not even sure to use the correct words here): I’d like to represent the uncertainty about the model using a shaded area around the “model” curve.

Is there a (not too complicated) way to do that? Ideally, the solution should be as independent of the model as possible, because my real use-case involves more complex models…

If you replace your last plot! statement (before savefig) with:

plot!(pred.x, pred.y, ribbon=(fill(0.3,length(pred.x)),0.5.+0.1*cos.(10pi*pred.x)),fc=:orange,fa=0.3,label="model", linewidth=3)

you get:


OK – my example of ribbon argument is not realistic, but then I don’t know how to use GLM… If GLM lets you produce prediction confidence intervals, my example should give an idea of how you can do it.

Could you try ]add StatsPlots#mkb/statsmodels? It’s the branch of this PR https://github.com/JuliaPlots/StatsPlots.jl/pull/293
Check the example in the first comment for usage. If you comment on the PR with your experiences we can merge it and make this available to everyone.

EDIT: I accidentally led this thread down a rabbit hole by plotting a prediction interval rather than a confidence interval below, and also bungling the use of ribbon. Long story short the ggplot2 plot shown below can also be obtained in Plots.jl/GLM.jl if one calls predict(model, pred, interval = :confidence, level = 0.95).

@BLI has the right plot command, let me add the GLM command you are probably after:


using DataFrames, GLM, Plots
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pr = predict(model, pred, interval = :prediction, level = 0.95)

plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pr.prediction, label="model", linewidth=3, 
        ribbon = (pr.prediction .- pr.lower, pr.upper .- pr.prediction))

The relevant docstring is:

?help> predict


  predict(mm::LinearModel, newx::AbstractMatrix;
          interval::Union{Symbol,Nothing} = nothing, level::Real = 0.95)

  If interval is nothing (the default), return a vector with the predicted values for model mm and new data newx. Otherwise, return a 3-column matrix with the prediction and the lower and upper confidence bounds for a given level (0.95 equates alpha = 0.05).
  Valid values of interval are :confidence delimiting the uncertainty of the predicted relationship, and :prediction delimiting estimated bounds for new data points.

@nilshg I believe the ribbon is called wrongly in your example

Ah yes sorry as usual I got confused with what ribbon does - let me fix that

They are in fact so small you almost can’t see the ribbon. This is with your code using the StatsPlots PR:

using DataFrames, StatsPlots, GLM
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);
model = lm(@formula(y ~ x), data)

@df data scatter(:x, :y)
plot!(model)

Now I’m confused - for my first prediction I get [0.98, 1.10] for upper and lower confidence interval, which looks about right for the ribbon my plot above. What is StatsPlots plotting?

This is how it looks in ggplot2 (same data with RCall)


I think you swapped the upper and the lower confidence interval perhaps? The ribbon keyword really sucks for this kind of thing in fact.

Many thanks to both of you!

I have been able to successfully use both @nilshg’s “manual” method and @mkborregaard’s StatsPlot branch to get the same plots in my simple example.

In my real case, I’m only able to use the “manual” method, which I guess is because my model is slightly mode complicated: @formula(y ~ x + x^2). It looks like I’m hitting the limitations mentioned in the PR : “it does only address the bivariate case”.


As a follow-up, would there be a way to extend this to Generalized Linear Models? It looks like predict does not support the interval keyword for such models…

For example, the previous case, with

model = glm(@formula(y ~ x), data, Normal(), IdentityLink())

You’re right, it’s not yet implemented on GLM (it isn’t in R either if that’s any help). There’s a PR on GLM.jl for it but its bugged (IMHO).
Could you comment the polynomial case on the PR in StatsPlots? That seems like the first thing to add.

Hilariously I contributed the predict with intervals to GLM like 3 years ago, exactly because I wanted to create support for this - and I still haven’t really come around to it. So nice with a bit of a push.

Will do. Thanks!

I am getting error (julia 1.5)
plot!(model)

ERROR: Cannot convert StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} to series data for plotting
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] _prepare_series_data(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:8
[3] _series_data_vector(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Dict{Symbol,Any}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:27
[4] macro expansion at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:139 [inlined]
[5] apply_recipe(::Dict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Nothing, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Nothing) at C:\Users\hafez.julia\packages\RecipesBase\AN696\src\RecipesBase.jl:282
[6] _process_userrecipes!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\user_recipe.jl:35
[7] recipe_pipeline!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\RecipesPipeline.jl:68
[8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:167
[9] #plot!#127 at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:158 [inlined]
[10] plot!(::Plots.Plot{Plots.GRBackend}, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:155
[11] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:150
[12] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:144
[13] top-level scope at REPL[12]:1
[14] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088

did yuo check out the pr?

to revive this, I currently get

data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ rand(100);
r = lm( y ~ x, data)
pred = GLM.predict(r, data, interval = :confidence, level = 0.95)
p = @df d scatter(:x, :y, leg = false, 
xlab = "Model", 
ylab = "Data")
plot!(p, data[!,x], pred.prediction, linewidth = 2,
ribbon = (pred.prediction .- pred.lower, pred.upper .- pred.prediction))

FWIW, say for QC, some shameless plug:

using LinearFitXYerrors, DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ rand(100);
st = linearfitxy(data.x, data.y, isplot=true, ratio=:auto)

Nice! Is this a standalone plotting package?

@floswald, LinearFitXYerrors.jl is a small Julia package to perform 1D linear fitting of experimental data with possibly correlated errors in X and Y variables, based in York et al. (2004).

linearfitxy() returns the regression results and uncertainties in a structure. A plot is displayed by setting optional parameter isplot = true

@floswald for that to work the data you predict on need to be sorted