Meaning of `GLM.lm` results (`t` and `Pr(>|t|)`)

bertulli · September 30, 2022, 2:49pm

Hi all!

This is a very noobish question, I was uncertain if I should put it in “New to Julia” section, so sorry about it.

Anyway, when I fit a linear model using the GLM package, like this:

using DataFrames
using CSV
using GLM

df = DataFrame(CSV.File("raw_planar_data.csv"));

fm = @formula(z ~ x + y)
@time(model = lm(fm, df))

Julia prints this pretty table:

julia> @time(model = lm(fm, df))
  0.048781 seconds (28.79 k allocations: 2.329 MiB, 40.01% gc time, 99.00% compilation time: 100% of which was recompilation)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

z ~ 1 + x + y

Coefficients:
──────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)  0.0413819  0.36563       0.11    0.9099  -0.675407   0.758171
x            1.99483    0.00960031  207.79    <1e-99   1.97601    2.01365
y            0.994769   0.00950755  104.63    <1e-99   0.97613    1.01341
──────────────────────────────────────────────────────────────────────────

For what I understood, Julia performs a t-test for each parameter \beta_i, checking

\begin{align} \mathbb{H}_0 &: \beta_i = 0 \\ \mathbb{H}_1 &: \beta_i \neq 0 \end{align}

Now, please tell me if I got it right:

t is the value of the t-statistic for each test
Pr(>|t|) is the p-value for each test
Lower 95% and Upper 95% are the confidence interval, with significance level \alpha = 0.05, for each parameter

Is this all correct, or did I interpreted something wrongly?

Thanks!

P.S., side question (if you like): how is the standard error calculated in a linear regression test?

rmsmsgood · September 30, 2022, 3:02pm

You all right. Anyway, @time doesn’t require (), that is, you can use that like below:

@time model = lm(fm, df)

pdeffebach · September 30, 2022, 3:06pm

This sounds suspiciously like a homework problem. There are plenty of resources online to learn about how standard errors are calculated.

bertulli · September 30, 2022, 3:25pm

It isn’t, but I see why it looked like so, sorry. The more complete question should have been: “what is the statistic used to assess the mean of a coefficient in a LM, from which I can then calculate the standard error?”. I have taken a Stats course at university but it didn’t cover linear regression. Anyway, I have probably found the answer, and it’s already too advanced for my curiosity-motivated study, so I think I’ll just trust the software library and use the results.

pdeffebach · September 30, 2022, 3:54pm

As a resource, Stock and Watsons introduction to econometrics has a very good description of OLS

dlakelan · September 30, 2022, 5:58pm

If you move quickly away from Frequentist stats and towards Bayesian stats then the answer is always very simple: everything is derived from the posterior distribution.

The frequentist tests for regression stuff can mostly be seen as approximations to Bayes under some improper prior distribution.

I just always do Bayes, but sometimes do GLM type stuff and interpret as convenient quick approximation of Bayes.

bertulli · October 1, 2022, 2:54pm

To @pdeffebach : thanks, but my question was about the test performed: are you suggesting that because that is part of the OLS standard process?

To @dlakelan : I still didn’t grasp the difference between frequentist and Bayesian statistics. Is it important to conduct a multiple linear regression? Or can I “just trust Julia” (and let’s say, accept the parameter when Pr(>|t|) < 0.05 as usual)?

dlakelan · October 1, 2022, 4:05pm

This accept and reject stuff is definitely what’s wrong with much of Frequentist stats. For example, if you have Pr(>|t|) = 0.07 will you “accept that the slope really is zero?” That is a very poor way to do things. The proper interpretation is rather that you have insufficient information to ensure exactly what sign the slope should be. In the real world almost nothing is exactly 0. And simply because you have a small sample size is no reason to conclude strongly that a parameter is actually 0. Similarly if in one dataset p<0.05 and another p>0.05 it is very wrong to say in condition one the parameter is not zero but rather equal approximately to the estimated value, and in condition two the parameter is exactly 0 and therefore the estimate of the difference of the effects is such and such…

It is worth it to avoid falling into the many many logical fallacies that are committed by the nonspecialist using the usual rituals of Null Hypothesis Significance Testing.

If you have not already had too much standard stats education you are in a good position to avoid making these mistakes perhaps look into Kruszke’s “Doing Bayesian Data Analysis” or some other similar very intro book. Mainly to build up a proper intuition for valid inferences rather than many fallacies.

See also Scientists rise up against statistical significance

bertulli · October 1, 2022, 4:22pm

Thanks for the suggestion! Yes I know that “not rejecting the null hypothesis” doesn’t mean “the null hypothesis is true”, but you’re right, knowing myself I would have got distracted and assumed it

My question (a bit too pragmatical, I admit), was “is this statistic sufficiently solid to trust the usual significance level (0.05) in a normal regression problem?”. Whose answer, I get now, is “it depends”.

One minor thing: wdym here?

Because I would have said, “since I got only basic stats educations, I am especially prone to error”

dlakelan · October 1, 2022, 4:30pm

I mean, it will be easier for you to unlearn the wrong thinking you were taught in 1 semester than the wrong thinking you have developed over several years of a stats masters etc.

Topic		Replies	Views
What are the t-tests in the GLM.jl's models? Statistics regression , glm , hypothesis-tests , linear-regression	7	454	August 28, 2023
Auxiliary GLM Statistics Statistics glm	9	2061	January 27, 2021
How to extract the Std. Error of a linear regression model by GLM? Statistics glm	3	361	May 3, 2024
What is the difference between the two methods of lm on GLM.jl Statistics regression , glm	2	468	February 25, 2023
Confidence Interval for certain Contrast and model residual Statistics	13	1574	February 20, 2019

Meaning of `GLM.lm` results (`t` and `Pr(>|t|)`)

Related topics