Building upon my previous example of truncated regression, I thought I’d do a worked example of censored regression. To clarify, censoring is where observations below a lower bound and/or observations above an upper bound are recorded at being at those bounds. An example would be weighting something heavier than the maximum weight that a measuring device can handle, and recording it as the max weight on the dial.
If we think about that in the context of a simple linear regression model, we could be concrete about censoring, defining it in a function:
function censor_data(x, y, L, U)
    censor = fill(0, length(x))
    yc = copy(y)
    for i in 1:length(x)
        if y[i] > U
            censor[i] = 1
            yc[i] = U
        elseif y[i] < L
            censor[i] = -1
            yc[i] = L
        else
            censor[i] = 0
        end
    end
    return x, yc, censor
end
The vector censor has values 0 for uncensored data, -1 for left/lower censored data, and 1 for right/upper censored data. This is necessary for the censored regression model that we’ll construct below.
We can then generate some fake data, censor it, and visualise the process of censoring
# generate data
m, c, σ, N = 1, 0, 2, 200
x = rand(Uniform(-10, 10), N)
y = rand.(Normal.(m.*x.+c, σ))
# Apply censoring
L, U = -5, 5
xc, yc, censor = censor_data(x, y, L, U)
Our censored regression model can be defined as below. I borrowed from the suggestion by @joshualeond in this post.
using Turing
@model function censored_regression(x, y, censor)
    m ~ Normal(0, 2)
    c ~ Normal(0, 2)
    σ ~ truncated(Normal(0, 20), 0, Inf)
    for i in 1:length(x)
        if censor[i] == 0  # observed
            Turing.@addlogprob! logpdf(Normal(m*x[i]+c, σ), y[i])
        elseif censor[i] == 1 # right/upper censored
            Turing.@addlogprob! logccdf(Normal(m*x[i]+c, σ), y[i])
        elseif censor[i] == -1 # left/lower censored
            Turing.@addlogprob! logcdf(Normal(m*x[i]+c, σ), y[i])
        end
    end
end
We can then sample and check the chains
chain = sample(censored_regression(xc, yc, censor), NUTS(), 2000)
plot(chain)
Everything looks good. The posterior for m is pretty close, and the posterior for c is slightly low, but nothing overly problematic.
Visualising the posterior predictive confirms that the model is a good fit to the data. By using the censored regression model, we have avoided the bias in estimating the slope that would have occurred with regular regression alone.


