Weighted linear regression with confidence interval fitted to error bars

Okarin99 · May 7, 2021, 9:10pm

Hi,
I’m pretty new to Data Science.
I have some Data with measurement errors and I want to fit a linear model.

using DataFrames
using GLM
using StatsPlots

df = DataFrame(x = [0.0, 0.0669873, 0.25, 0.5, 0.75, 0.933013, 1.0],
y = [0.223, 0.291, 0.393, 0.549, 0.73, 0.85, 0.896],
u_y = [0.023, 0.024, 0.027, 0.031, 0.037, 0.041, 0.043])

# Weighted Least Squares
wlr = glm(@formula(y~x), df, Normal(), wts = 1 ./ df.u_y.^2)

x_interval = DataFrame(x = 0:0.01:1);
pred = predict(wlr, x_interval, interval = :confidence)
@df df scatter(:x, :y, yerror = :u_y)
plot!(x_interval.x, pred.prediction,
ribbon = (pred.prediction .- pred.lower, pred.upper .- pred.prediction))

I get this result:
plot
The confidence interval should get plotted but it is so small that you can’t even see it.
Shouldn’t it be as wide as the measurement errors?
How do I fit it accordingly?

Thank your for all answers and have a nice day

nilshg · May 8, 2021, 6:47am

Your question is hopefully answered here Plot the confidence interval for a model fit

rafael.guerra · May 8, 2021, 7:44am

GLM.jl’s glm does not interpret the weights wts as inverse variances but as prior frequencies:

    - `wts::Vector=similar(y,0)`: Prior frequency (a.k.a. case) weights of observations.
      Such weights are equivalent to repeating each observation a number of times equal
      to its weight. Do note that this interpretation gives equal point estimates but
      different standard errors from analytical (a.k.a. inverse variance) weights and
      from probability (a.k.a. sampling) weights which are the default in some other
      software.
      Can be length 0 to indicate no weighting (default).

baggepinnen · May 8, 2021, 8:31am

There are two different confidence intervals to consider, the confidence interval for the posterior mean, and the interval for a new measurement. You are talking about the interval for a new measurement, which contains uncertainty about both the mean and the measurement.

Okarin99 · May 8, 2021, 5:37pm

So to get the right confidence interval I have to sum up the confidence interval I get from glm with the mean of the measurement errors?

rafael.guerra · May 8, 2021, 10:12pm

As indicated in GLM.jl’s doc above, glm does not handle inverse-variance weighting.

For this purpose, you may use LsqFit.jl.

using DataFrames, LsqFit, Printf, Plots; gr()

df = DataFrame(x = [0.0, 0.0669873, 0.25, 0.5, 0.75, 0.933013, 1.0],
        y = [0.223, 0.291, 0.393, 0.549, 0.73, 0.85, 0.896],
        u_y = [0.023, 0.024, 0.027, 0.031, 0.037, 0.041, 0.043])

x, y = df.x, df.y
wt = 1 ./ df.u_y .^2

p0 = [0.5, 0.5]
m(x, p) = p[1] .+ p[2] * x         # p: model parameters
fit = curve_fit(m, x, y, wt, p0)

cf = coef(fit)
ci = confidence_interval(fit, 0.05)    # 5% significance level

str = @sprintf("Y = (%.2f +/- %.2f) + (%.2f +/- %.2f)*X",
          cf[1],diff([ci[1]...])[1]/2, cf[2],diff([ci[2]...])[1]/2)

tl, bl = ci[1][1] .+ ci[2][2]*x,   ci[1][2] .+ ci[2][1]*x
σp, σm = maximum([tl bl], dims=2) .-  m(x,cf),  m(x,cf) .- minimum([tl bl], dims=2)

plot(x, cf[1] .+ cf[2]*x, color=:lightblue, ribbon=(σp,σm), label=str)
plot!(x, cf[1] .+ cf[2]*x, color=:blues, lw=1, label=false, xlabel="X",ylabel="Y")
scatter!(x,y, ms=3,label=false,mc=:blue, yerror=df.u_y, legend=:topleft)

LsqFit_weighted_linear_regression

baggepinnen · May 9, 2021, 11:01am

Maybe, but probably not. It really depends on what question you want to answer. Usually, you use measurements to learn about some underlying system, and you postulate that the system behaves according to the model you have identified. You are then typically interested in learning about the posterior over the model parameters after having seen the data. The confidence interval for a new measurement is usually not very interesting since you often do not care about the measurement, you care about the thing you are trying to measure. It therefore makes sense to plot the confidence bounds like @rafael.guerra did with a shadede ribbon in the figure above, around the posterior mean of the model prediction.

Topic		Replies	Views
Weighted linear regression with confidence interval General Usage package , regression	6	1261	May 6, 2021
Plot the confidence interval for a model fit Statistics plotting , fit , glm	20	9097	September 24, 2021
How to fit a function to measurements with error? Data question , curve-fitting , measurements	12	4674	September 8, 2024
Weightened linear model in GLM.jl General Usage glm	3	111	September 28, 2024
How do I get lower and upper bounds for a confidence band using LsqFit.jl? General Usage statistics , fit , curve-fitting , plot	8	1233	October 11, 2022

Weighted linear regression with confidence interval fitted to error bars

Related topics