Plot the confidence interval for a model fit

I’m no data scientist, so it’s likely that my question does not make much sense… But here goes:

I recently tried to do some data analysis that involved fitting models. I’m able to fit the model, compute represent a prediction that it produces, but I’d like to also (compute and) represent confidence intervals for the model.

Here is my minimal (but incomplete) example:

using DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

using GLM
model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pred.y = predict(model, pred);

using Plots
plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pred.y, label="model", linewidth=3)
savefig("/tmp/plot.png")

plot

To rephrase (because I’m not even sure to use the correct words here): I’d like to represent the uncertainty about the model using a shaded area around the “model” curve.

Is there a (not too complicated) way to do that? Ideally, the solution should be as independent of the model as possible, because my real use-case involves more complex models…

4 Likes

If you replace your last plot! statement (before savefig) with:

plot!(pred.x, pred.y, ribbon=(fill(0.3,length(pred.x)),0.5.+0.1*cos.(10pi*pred.x)),fc=:orange,fa=0.3,label="model", linewidth=3)

you get:


OK – my example of ribbon argument is not realistic, but then I don’t know how to use GLM… If GLM lets you produce prediction confidence intervals, my example should give an idea of how you can do it.

5 Likes

Could you try ]add StatsPlots#mkb/statsmodels? It’s the branch of this PR https://github.com/JuliaPlots/StatsPlots.jl/pull/293
Check the example in the first comment for usage. If you comment on the PR with your experiences we can merge it and make this available to everyone.

1 Like

EDIT: I accidentally led this thread down a rabbit hole by plotting a prediction interval rather than a confidence interval below, and also bungling the use of ribbon. Long story short the ggplot2 plot shown below can also be obtained in Plots.jl/GLM.jl if one calls predict(model, pred, interval = :confidence, level = 0.95).

@BLI has the right plot command, let me add the GLM command you are probably after:


using DataFrames, GLM, Plots
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pr = predict(model, pred, interval = :prediction, level = 0.95)

plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pr.prediction, label="model", linewidth=3, 
        ribbon = (pr.prediction .- pr.lower, pr.upper .- pr.prediction))

The relevant docstring is:

?help> predict


  predict(mm::LinearModel, newx::AbstractMatrix;
          interval::Union{Symbol,Nothing} = nothing, level::Real = 0.95)

  If interval is nothing (the default), return a vector with the predicted values for model mm and new data newx. Otherwise, return a 3-column matrix with the prediction and the lower and upper confidence bounds for a given level (0.95 equates alpha = 0.05).
  Valid values of interval are :confidence delimiting the uncertainty of the predicted relationship, and :prediction delimiting estimated bounds for new data points.
5 Likes

@nilshg I believe the ribbon is called wrongly in your example

2 Likes

Ah yes sorry as usual I got confused with what ribbon does - let me fix that

They are in fact so small you almost can’t see the ribbon. This is with your code using the StatsPlots PR:

using DataFrames, StatsPlots, GLM
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);
model = lm(@formula(y ~ x), data)

@df data scatter(:x, :y)
plot!(model)

1 Like

Now I’m confused - for my first prediction I get [0.98, 1.10] for upper and lower confidence interval, which looks about right for the ribbon my plot above. What is StatsPlots plotting?

2 Likes

This is how it looks in ggplot2 (same data with RCall)


I think you swapped the upper and the lower confidence interval perhaps? The ribbon keyword really sucks for this kind of thing in fact.

1 Like

Many thanks to both of you!

I have been able to successfully use both @nilshg’s “manual” method and @mkborregaard’s StatsPlot branch to get the same plots in my simple example.

In my real case, I’m only able to use the “manual” method, which I guess is because my model is slightly mode complicated: @formula(y ~ x + x^2). It looks like I’m hitting the limitations mentioned in the PR : “it does only address the bivariate case”.


As a follow-up, would there be a way to extend this to Generalized Linear Models? It looks like predict does not support the interval keyword for such models…

For example, the previous case, with

model = glm(@formula(y ~ x), data, Normal(), IdentityLink())

You’re right, it’s not yet implemented on GLM (it isn’t in R either if that’s any help). There’s a PR on GLM.jl for it but its bugged (IMHO).
Could you comment the polynomial case on the PR in StatsPlots? That seems like the first thing to add.

1 Like

Hilariously I contributed the predict with intervals to GLM like 3 years ago, exactly because I wanted to create support for this - and I still haven’t really come around to it. So nice with a bit of a push.

2 Likes

Will do. Thanks!

I am getting error (julia 1.5)
plot!(model)

ERROR: Cannot convert StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} to series data for plotting
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] _prepare_series_data(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:8
[3] _series_data_vector(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Dict{Symbol,Any}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:27
[4] macro expansion at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\series.jl:139 [inlined]
[5] apply_recipe(::Dict{Symbol,Any}, ::Type{RecipesPipeline.SliceIt}, ::Nothing, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::Nothing) at C:\Users\hafez.julia\packages\RecipesBase\AN696\src\RecipesBase.jl:282
[6] _process_userrecipes!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\user_recipe.jl:35
[7] recipe_pipeline!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\RecipesPipeline\5RD7m\src\RecipesPipeline.jl:68
[8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:167
[9] #plot!#127 at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:158 [inlined]
[10] plot!(::Plots.Plot{Plots.GRBackend}, ::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:155
[11] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:150
[12] plot!(::StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}) at C:\Users\hafez.julia\packages\Plots\ViMfq\src\plot.jl:144
[13] top-level scope at REPL[12]:1
[14] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088

did yuo check out the pr?

to revive this, I currently get

data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ rand(100);
r = lm( y ~ x, data)
pred = GLM.predict(r, data, interval = :confidence, level = 0.95)
p = @df d scatter(:x, :y, leg = false, 
xlab = "Model", 
ylab = "Data")
plot!(p, data[!,x], pred.prediction, linewidth = 2,
ribbon = (pred.prediction .- pred.lower, pred.upper .- pred.prediction))

FWIW, say for QC, some shameless plug:

using LinearFitXYerrors, DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ rand(100);
st = linearfitxy(data.x, data.y, isplot=true, ratio=:auto)

1 Like

Nice! Is this a standalone plotting package?

@floswald, LinearFitXYerrors.jl is a small Julia package to perform 1D linear fitting of experimental data with possibly correlated errors in X and Y variables, based in York et al. (2004).

linearfitxy() returns the regression results and uncertainties in a structure. A plot is displayed by setting optional parameter isplot = true

1 Like

@floswald for that to work the data you predict on need to be sorted

1 Like