Efficient way of doing linear regression

markushhh · November 25, 2019, 8:50am

what about?

using Random

N = 10000
x = rand(N)
X = [ones(N) x]
y = 10 .+ x .* 0.3

function linreg1(y, X)
    β_hat = (X' * X) \ X' * y
    return(β_hat)
end

function linreg2(y, X)
    β_hat = X \ y
    return(β_hat)
end

using GLM
GLM.fit(LinearModel, X, y, true)

using DataFrames, GLM
data = DataFrame(X = x, Y = y)
lm(@formula(Y ~ X), data)


using BenchmarkTools
@benchmark linreg1(y, X)
@benchmark linreg2(y, X)
@benchmark GLM.fit(LinearModel, X, y, true)
@benchmark lm(@formula(Y ~ X), data)

and the results are

lingerg1:

BenchmarkTools.Trial: 
  memory estimate:  156.78 KiB
  allocs estimate:  8
  --------------
  minimum time:     213.900 μs (0.00% GC)
  median time:      289.351 μs (0.00% GC)
  mean time:        552.780 μs (4.66% GC)
  maximum time:     76.192 ms (0.00% GC)
  --------------
  samples:          8836
  evals/sample:     1

linreg2:

BenchmarkTools.Trial: 
  memory estimate:  303.69 KiB
  allocs estimate:  45
  --------------
  minimum time:     118.300 μs (0.00% GC)
  median time:      170.300 μs (0.00% GC)
  mean time:        344.598 μs (17.20% GC)
  maximum time:     57.911 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

fit():

BenchmarkTools.Trial: 
  memory estimate:  470.45 KiB
  allocs estimate:  26
  --------------
  minimum time:     133.700 μs (0.00% GC)
  median time:      189.299 μs (0.00% GC)
  mean time:        428.768 μs (17.15% GC)
  maximum time:     45.171 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

lm():

BenchmarkTools.Trial: 
  memory estimate:  1.08 MiB
  allocs estimate:  186
  --------------
  minimum time:     546.600 μs (0.00% GC)
  median time:      839.500 μs (0.00% GC)
  mean time:        1.917 ms (10.94% GC)
  maximum time:     127.190 ms (0.00% GC)
  --------------
  samples:          2588
  evals/sample:     1

Topic		Replies	Views
In-place matrix operations slower? Performance	9	374	July 14, 2023
Parallel/multithreading version of linear regression Performance performance , parallel , multithreading , distributed , regression	19	1959	October 28, 2020
Repeated linear least squares problem with large dimensions Performance linear-regression	7	115	June 19, 2025
Linear regression from scratch using QR decomposition General Usage linearalgebra , regression , qr	3	1148	November 12, 2021
Improving Computational Performance Performance fast-math	6	939	June 20, 2021

Efficient way of doing linear regression

Related topics