Have confusion regarding Linear Regression

Blood pressure is normally measured using cuff based methods. New contactless methods for blood pressure measurements rely on measuring time difference between two biomedical waveforms. This time difference is called pulse transit time or pulse arrival time (PAT). The relationship between systolic blood pressure and PAT is not clear but there are several potential options:
BP = a+bPAT + noise
BP=c+d
ln(PAT) + noise

I am trying to do traditional linear gression, is it possible to create chains for the variables a,b,c,d and noise so I can show varianes of these data?

My confusion is that, as far as I know, for plots of the chain for a,b,c,d I can do it for Bayesian Linear Regression, using MCMC according to the source in Linear Regression . However, for traditional linear regression I am confused about how to generate chain plots for the variable a,b,c,d. Is it possible to plot chain diagrams for simple linear regression?

What is traditional linear regression? If you mean standard OLS estimated based on the closed form solution to the least squares problem, (X'X)^{-1}X'Y, then inference will normally be done based on the asymptotic distribution of the estimator, which again is available in closed form. There’s no sampling and therefore no chain.

This isn’t really a Julia question I suppose, you might want to consult a standard textbook that discussed this, such as Wooldridge’s Introductory Econometrics.

thank you so much. I am new to statistics (linear regression), and trying to solve problems in Julia as well. When I was trying to solve this, I had the same question cause it not make sense to me, cause i thought you can create chains for the lls/ols linear regression. Anyways, I guess you can create sample for Bayesian Linear Regression, and form chains using MCMC, which can later provide mean and variance for the posterior distribution. Thanks.

Traditional linear regression, as was noted above, leads to a point estimator. If the errors are normally distributed, then the small sample distribution of the estimator will also be normal. With non-normal errors, the asymptotic distribution will still be normal. However, the small sample distribution will not be so. Sometimes, bootstrapping is used in this context to explore the small sample distribution. In that context, it could make sense to represent the bootstrap samples using a chain. Here’s a code example that does that:

using Plots, Distributions, Statistics

# simple iid bootstrap
function bootstrap(data)
    n = size(data,1)
    resampled = similar(data)
    for i = 1:n
        j = rand(1:n)
        resampled[i,:] = data[j,:]
    end
    return resampled
end    

n = 50
reps = 1000
x = [ones(n) randn(n)]
β = [2.,-1.]
ϵ = rand(Chisq(3.),n) .- 1.5
y = x*β + ϵ
data = [y x]
bs = zeros(reps,2)
for i = 1:reps
    d = bootstrap(data)
    bs[i,:] = d[:,2:end] \ d[:,1]
end
plot(bs[:,2], labels=false)
q05 = quantile(bs[:,2], 0.05)
q95 = quantile(bs[:,2], 0.95)
hline!([q05], labels="q05")
hline!([q95], labels="q95")
1 Like

thank you so much, I have another confusion. In doing traditional linear regression, say OLS/LLS, we do not consider the error when finding the coefficients and y-intercept, can you explain why so?

The assumption is that the errors are not observed, so we need an estimator of the coefficients that depends only on the observed data.

Isn’t this equivalent to the following one-liner:

bootstrap(data) = data[rand(1:end,end),:]

?

Yes, I noticed that, too, after I posted it. I wrote that function a long time ago… Probably, I should fix it to work with arrays of different sizes.