 # Plot the confidence interval for a model fit

I’m no data scientist, so it’s likely that my question does not make much sense… But here goes:

I recently tried to do some data analysis that involved fitting models. I’m able to fit the model, compute represent a prediction that it produces, but I’d like to also (compute and) represent confidence intervals for the model.

Here is my minimal (but incomplete) example:

``````using DataFrames
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

using GLM
model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pred.y = predict(model, pred);

using Plots
plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pred.y, label="model", linewidth=3)
savefig("/tmp/plot.png")
`````` To rephrase (because I’m not even sure to use the correct words here): I’d like to represent the uncertainty about the model using a shaded area around the “model” curve.

Is there a (not too complicated) way to do that? Ideally, the solution should be as independent of the model as possible, because my real use-case involves more complex models…

2 Likes

If you replace your last `plot!` statement (before `savefig`) with:

``````plot!(pred.x, pred.y, ribbon=(fill(0.3,length(pred.x)),0.5.+0.1*cos.(10pi*pred.x)),fc=:orange,fa=0.3,label="model", linewidth=3)
``````

you get:

OK – my example of `ribbon` argument is not realistic, but then I don’t know how to use GLM… If GLM lets you produce prediction confidence intervals, my example should give an idea of how you can do it.

2 Likes

Could you try `]add StatsPlots#mkb/statsmodels`? It’s the branch of this PR https://github.com/JuliaPlots/StatsPlots.jl/pull/293
Check the example in the first comment for usage. If you comment on the PR with your experiences we can merge it and make this available to everyone.

1 Like

EDIT: I accidentally led this thread down a rabbit hole by plotting a prediction interval rather than a confidence interval below, and also bungling the use of `ribbon`. Long story short the ggplot2 plot shown below can also be obtained in Plots.jl/GLM.jl if one calls `predict(model, pred, interval = :confidence, level = 0.95)`.

@BLI has the right plot command, let me add the GLM command you are probably after:

``````
using DataFrames, GLM, Plots
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);

model = lm(@formula(y ~ x), data)
pred = DataFrame(x = 0:0.01:1);
pr = predict(model, pred, interval = :prediction, level = 0.95)

plot(xlabel="x", ylabel="y", legend=:bottomright)
plot!(data.x, data.y, label="data", seriestype=:scatter)
plot!(pred.x, pr.prediction, label="model", linewidth=3,
ribbon = (pr.prediction .- pr.lower, pr.upper .- pr.prediction))
``````

The relevant docstring is:

``````?help> predict

predict(mm::LinearModel, newx::AbstractMatrix;
interval::Union{Symbol,Nothing} = nothing, level::Real = 0.95)

If interval is nothing (the default), return a vector with the predicted values for model mm and new data newx. Otherwise, return a 3-column matrix with the prediction and the lower and upper confidence bounds for a given level (0.95 equates alpha = 0.05).
Valid values of interval are :confidence delimiting the uncertainty of the predicted relationship, and :prediction delimiting estimated bounds for new data points.
``````
2 Likes

@nilshg I believe the ribbon is called wrongly in your example

2 Likes

Ah yes sorry as usual I got confused with what `ribbon` does - let me fix that

They are in fact so small you almost can’t see the ribbon. This is with your code using the StatsPlots PR:

``````using DataFrames, StatsPlots, GLM
data = DataFrame(x = rand(100));
data.y = 1 .+ 2*data.x .+ 0.1*rand(100);
model = lm(@formula(y ~ x), data)

@df data scatter(:x, :y)
plot!(model)
``````

1 Like

Now I’m confused - for my first prediction I get [0.98, 1.10] for upper and lower confidence interval, which looks about right for the ribbon my plot above. What is StatsPlots plotting?

2 Likes

This is how it looks in ggplot2 (same data with RCall)

I think you swapped the upper and the lower confidence interval perhaps? The `ribbon` keyword really sucks for this kind of thing in fact.

1 Like

Many thanks to both of you!

I have been able to successfully use both @nilshg’s “manual” method and @mkborregaard’s `StatsPlot` branch to get the same plots in my simple example.

In my real case, I’m only able to use the “manual” method, which I guess is because my model is slightly mode complicated: `@formula(y ~ x + x^2)`. It looks like I’m hitting the limitations mentioned in the PR : “it does only address the bivariate case”.

As a follow-up, would there be a way to extend this to Generalized Linear Models? It looks like `predict` does not support the `interval` keyword for such models…

For example, the previous case, with

``````model = glm(@formula(y ~ x), data, Normal(), IdentityLink())
``````

You’re right, it’s not yet implemented on GLM (it isn’t in R either if that’s any help). There’s a PR on GLM.jl for it but its bugged (IMHO).
Could you comment the polynomial case on the PR in StatsPlots? That seems like the first thing to add.

1 Like

Hilariously I contributed the `predict` with `intervals` to GLM like 3 years ago, exactly because I wanted to create support for this - and I still haven’t really come around to it. So nice with a bit of a push.

2 Likes

Will do. Thanks!