Have confusion regarding Linear Regression

mirror63 · January 28, 2022, 10:46pm

Blood pressure is normally measured using cuff based methods. New contactless methods for blood pressure measurements rely on measuring time difference between two biomedical waveforms. This time difference is called pulse transit time or pulse arrival time (PAT). The relationship between systolic blood pressure and PAT is not clear but there are several potential options:
BP = a+bPAT + noise
BP=c+dln(PAT) + noise

I am trying to do traditional linear gression, is it possible to create chains for the variables a,b,c,d and noise so I can show varianes of these data?

My confusion is that, as far as I know, for plots of the chain for a,b,c,d I can do it for Bayesian Linear Regression, using MCMC according to the source in Linear Regression . However, for traditional linear regression I am confused about how to generate chain plots for the variable a,b,c,d. Is it possible to plot chain diagrams for simple linear regression?

nilshg · January 29, 2022, 6:56am

What is traditional linear regression? If you mean standard OLS estimated based on the closed form solution to the least squares problem, (X'X)^{-1}X'Y, then inference will normally be done based on the asymptotic distribution of the estimator, which again is available in closed form. There’s no sampling and therefore no chain.

This isn’t really a Julia question I suppose, you might want to consult a standard textbook that discussed this, such as Wooldridge’s Introductory Econometrics.

mirror63 · January 29, 2022, 7:15am

thank you so much. I am new to statistics (linear regression), and trying to solve problems in Julia as well. When I was trying to solve this, I had the same question cause it not make sense to me, cause i thought you can create chains for the lls/ols linear regression. Anyways, I guess you can create sample for Bayesian Linear Regression, and form chains using MCMC, which can later provide mean and variance for the posterior distribution. Thanks.

mcreel · January 29, 2022, 7:54am

Traditional linear regression, as was noted above, leads to a point estimator. If the errors are normally distributed, then the small sample distribution of the estimator will also be normal. With non-normal errors, the asymptotic distribution will still be normal. However, the small sample distribution will not be so. Sometimes, bootstrapping is used in this context to explore the small sample distribution. In that context, it could make sense to represent the bootstrap samples using a chain. Here’s a code example that does that:

using Plots, Distributions, Statistics

# simple iid bootstrap
function bootstrap(data)
    n = size(data,1)
    resampled = similar(data)
    for i = 1:n
        j = rand(1:n)
        resampled[i,:] = data[j,:]
    end
    return resampled
end    

n = 50
reps = 1000
x = [ones(n) randn(n)]
β = [2.,-1.]
ϵ = rand(Chisq(3.),n) .- 1.5
y = x*β + ϵ
data = [y x]
bs = zeros(reps,2)
for i = 1:reps
    d = bootstrap(data)
    bs[i,:] = d[:,2:end] \ d[:,1]
end
plot(bs[:,2], labels=false)
q05 = quantile(bs[:,2], 0.05)
q95 = quantile(bs[:,2], 0.95)
hline!([q05], labels="q05")
hline!([q95], labels="q95")

mirror63 · January 29, 2022, 8:22am

thank you so much, I have another confusion. In doing traditional linear regression, say OLS/LLS, we do not consider the error when finding the coefficients and y-intercept, can you explain why so?

mcreel · January 29, 2022, 8:32am

The assumption is that the errors are not observed, so we need an estimator of the coefficients that depends only on the observed data.

rafael.guerra · January 29, 2022, 11:51am

mcreel:

# simple iid bootstrap
function bootstrap(data)
    n = size(data,1)
    resampled = similar(data)
    for i = 1:n
        j = rand(1:n)
        resampled[i,:] = data[j,:]
    end
    return resampled
end

Isn’t this equivalent to the following one-liner:

bootstrap(data) = data[rand(1:end,end),:]

?

mcreel · January 29, 2022, 11:59am

Yes, I noticed that, too, after I posted it. I wrote that function a long time ago… Probably, I should fix it to work with arrays of different sizes.

Topic		Replies	Views
Efficient way of doing linear regression Performance regression	44	20582	February 7, 2022
Linear Regression step by step for newcomers Machine Learning question	7	4969	November 2, 2022
What is the equivalent Julia code for statsmodels.api.OLS Statistics python	15	466	March 26, 2024
Error in variables linear regression General Usage question , package , regression	6	881	October 8, 2022
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5090	November 26, 2018

Have confusion regarding Linear Regression

Related topics