Turing: truncated regression example

drbenvincent · January 23, 2021, 1:50pm

Hi all. I’ve found that there are no real definitive guides on truncated vs censored regression in a Bayesian context. They are given some treatment in some Bayesian texts, but I’ve not found it easy to find worked examples in different PPL’s - and I have seen some examples which get truncation and censoring a bit muddled. I believe I’ve got a working example of truncated regression. I’m hoping that I can get some confirmation that this is the right way of implementing truncated regression in Turing.

First, let’s be clear what truncation is. It is where data outside of some bounds are discarded. The person doing the linear regression on real data knows nothing about the data outside these bounds, nor how much there was. In a simple regression context where we truncate our outcome variable y, it would correspond to this…

function truncate(x, y, bounds)
    L, U = bounds
    keep = (y.>L) .& (y.<U)
    return x[keep], y[keep]
end

Then we generate some data and then truncate it

m, c, σ, N = 1, 0, 2, 300
x = rand(Uniform(-10, 10), N)
y = rand.(Normal.(m.*x.+c, σ))
L, U = -5, 5
xt, yt = truncate(x, y, (L, U))

resulting in a dataset looking like this

Now we can define a regular regression model

using Turing

@model function linear_regression(x, y)
    m ~ Normal(0, 2)
    c ~ Normal(0, 2)
    σ ~ truncated(Normal(0, 20), 0, Inf)
    for i in 1:length(x)
        y[i] ~ Normal(m*x[i]+c, σ)
    end
end

And a truncated regression model where the likelihood is truncated at the bounds

@model function truncated_regression(x, y, lims)
    L, U = lims
    m ~ Normal(0, 2)
    c ~ Normal(0, 2)
    σ ~ truncated(Normal(0, 20), 0, Inf)
    for i in 1:length(x)
        y[i] ~ truncated(Normal(m*x[i]+c, σ), L, U)
    end
end

Then we conduct parameter estimation on both of these models on the truncated data

chain_lin = sample(linear_regression(xt, yt), NUTS(), 5000)
chain_trunc = sample(truncated_regression(xt, yt, (L, U)), NUTS(), 5000)

And we can write some plot code to visualise the data and the posterior predictive fits

function visualise(x, y, chain, subplot, true_params)
    m, c = true_params
    scatter!(x, y, color=:grey, markerstrokewidth=0, label=:none, subplot=subplot)
    xi = [minimum(x), maximum(x)]
    n_samples = length(chain[:m])
    samples_to_draw = 100
    samples_to_plot = sample([1:n_samples;], samples_to_draw; replace=false)
    for i in samples_to_plot
        plot!(xi, chain[:m][i].*xi.+chain[:c][i], color=:black, alpha=0.1, label=false, subplot=subplot)
    end
    plot!(xi, mean(chain[:m]).*xi.+mean(chain[:c]), color=:red, lw=2, label="posterior mean", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
    plot!(xi, m.*xi.+c, color=:black, lw=3, label="true", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
    plot!(tick_direction=:out)
end

plot(layout=(1,2), size=(1000, 500))
visualise(xt, yt, chain_lin, 1, (m, c))
plot!(title="truncated data\nlinear regression", subplot=1)
hline!([L, U], subplot=1, colour=:black, label=false)

visualise(xt, yt, chain_trunc, 2, (m, c))
plot!(title="truncated data\ntruncated regression", subplot=2)
hline!([L, U], subplot=2, colour=:black, label=false)

resulting in this

As we can see, the linear regression applied to the truncated data has bias on the slope parameter. However, applying the truncated regression model to the same truncated data results in a much better estimate of the regression slope.

Does all this look correct? If so, I’ll attempt to do an equivalent example for censored regression.

Christopher_Fisher · January 23, 2021, 3:12pm

As far as I can tell, this seems correct. One way to verify is to compare generated data to your PDF. What I found is that the PDF fits the generated data quite well.

using Distributions, Plots

x = rand(Normal(0,1), 10^6)

xt = filter(x-> x >= -2 && x <= 2.0, x)

vals = -2:.01:2

dens = pdf.(truncated(Normal(0,1), -2, 2), vals)

histogram(xt, norm=true)

plot!(vals, dens)

Topic		Replies	Views
Turing: censored regression example Probabilistic Programming	9	2576	April 30, 2022
[Turing.jl] Censored Values Probabilistic Programming	21	2408	February 13, 2021
Turing: selective truncated distribution error Probabilistic Programming turing , distributions , autodiff	1	143	September 12, 2024
Truncate a distribution using draws of a random variable as bound Statistics turing , distributions	1	71	September 2, 2024
Simple multi-parameter bayesian linear regression returning un-expected results! Statistics statistics , turing , bayesian-inference	14	898	August 9, 2022

Turing: truncated regression example

Related topics