Building upon my previous example of truncated regression, I thought I’d do a worked example of censored regression. To clarify, censoring is where observations below a lower bound and/or observations above an upper bound are recorded at being at those bounds. An example would be weighting something heavier than the maximum weight that a measuring device can handle, and recording it as the max weight on the dial.
If we think about that in the context of a simple linear regression model, we could be concrete about censoring, defining it in a function:
function censor_data(x, y, L, U) censor = fill(0, length(x)) yc = copy(y) for i in 1:length(x) if y[i] > U censor[i] = 1 yc[i] = U elseif y[i] < L censor[i] = -1 yc[i] = L else censor[i] = 0 end end return x, yc, censor end
censor has values
0 for uncensored data,
-1 for left/lower censored data, and
1 for right/upper censored data. This is necessary for the censored regression model that we’ll construct below.
We can then generate some fake data, censor it, and visualise the process of censoring
# generate data m, c, σ, N = 1, 0, 2, 200 x = rand(Uniform(-10, 10), N) y = rand.(Normal.(m.*x.+c, σ)) # Apply censoring L, U = -5, 5 xc, yc, censor = censor_data(x, y, L, U)
using Turing @model function censored_regression(x, y, censor) m ~ Normal(0, 2) c ~ Normal(0, 2) σ ~ truncated(Normal(0, 20), 0, Inf) for i in 1:length(x) if censor[i] == 0 # observed Turing.@addlogprob! logpdf(Normal(m*x[i]+c, σ), y[i]) elseif censor[i] == 1 # right/upper censored Turing.@addlogprob! logccdf(Normal(m*x[i]+c, σ), y[i]) elseif censor[i] == -1 # left/lower censored Turing.@addlogprob! logcdf(Normal(m*x[i]+c, σ), y[i]) end end end
We can then sample and check the chains
chain = sample(censored_regression(xc, yc, censor), NUTS(), 2000) plot(chain)
Everything looks good. The posterior for m is pretty close, and the posterior for c is slightly low, but nothing overly problematic.
Visualising the posterior predictive confirms that the model is a good fit to the data. By using the censored regression model, we have avoided the bias in estimating the slope that would have occurred with regular regression alone.