Accelerating linear methods

Leticia-maria · May 31, 2023, 7:14pm

Dear Julia friends, I am benchmarking Julia for linear regression models and trying to optimize that for high-dimensional spaces. Any suggestions on the code below to reduce allocations and speed up the code? Thank you so much

using Random
using GLM
using DataFrames
using BenchmarkTools

function benchmark(dimensions::Int)
    Random.seed!(1234)
    X = randn(100000, dimensions)
    β = randn(dimensions)
    y = X * β + 0.1 * randn(100000)

    # Avoiding column name generation
    colnames = [Symbol("x$i") for i in 1:dimensions]
    push!(colnames, :y)
    data = DataFrame(hcat(X, y), colnames)  # Creating DataFrame in one step to avoid extra allocations

    formula = @inbounds Term(:y) ~ sum(Term(Symbol("x$i")) for i in 1:dimensions)

    GLM.lm(formula, data)
end

for dims in [10, 100, 1000, 10000]
    println("Warming up for dimension $dims")
    benchmark(dims)  # Warm-up run

    println("Benchmarking for dimension $dims")
    @btime benchmark($dims)  # Using BenchmarkTools.@btime for more accurate timing and allocation measurement
end

gdalle · May 31, 2023, 7:51pm

I just profiled the code with @profview benchmark(1000) and on my computer, this call spent roughly

50% of the time in lm, which is split pretty evenly between ModelMatrix, ModelFrame and fit
30% of the time in DataFrame (constructing data)
20% of the time in randn (generating X)

One would assume the fit method does the actual work, everything else is just moving stuff around in memory. What is the context of this function? Will you call it several times? Can you reuse X or data?

stevengj · May 31, 2023, 7:55pm

Have you considered just skipping the dataframe and GLM stuff and just doing fit = X \ y directly? That’s all it’s doing in the end.

(Reducing it to X \ y lets you further optimize things by customizing the method of the linear solve, doing things in-place, etcetera. It may also trim off a bunch of overhead.)

jar1 · May 31, 2023, 7:58pm

lm also calculates other values like t-statistics etc. It would be cool to have more control over the solver though.

pdeffebach · May 31, 2023, 7:59pm

You might want to try FixedEffectModels.jl, which also does linear regression but may be faster.

Also, your benchmark includes creating the DataFrame. You should split the data creation and estimation process into separate pieces to best analyze the performance of the estimation.

Leticia-maria · May 31, 2023, 11:48pm

Yes, I am actually calling it several times (like thousands), so any small improvement is critical

Dan · June 1, 2023, 12:10am

fit or X \ y complexity also scale differently from other operations (cubic in dimension, IIRC), and therefore testing benchmark(10_000) should spend most time there and optimizing the rest will be less crucial.
Having said this, for smaller problems it is always nice to reduce overhead.
One would want to minimize copying of the data (as just generating it takes 20% with dimension 1_000). Creating the DataFrame with views to columns might help, by adding copycols=false like so:

data = DataFrame(hcat(X, y), colnames; copycols=false)

and maybe to avoid hcat copying, using some in-place versions:

function benchmark(dimensions::Int)
    Random.seed!(1234)
    D = Matrix{Float64}(undef, 100_000, dimensions+1)
    randn!(@view D[:,1:end-1])
    β = randn(dimensions)
    @views D[:,end] .= D[:,1:end-1] * β + 0.1 * randn(100000)

    # Avoiding column name generation
    colnames = [Symbol("x$i") for i in 1:dimensions]
    push!(colnames, :y)
    data = DataFrame(D, colnames; copycols=false)  # Creating DataFrame in one step to avoid extra allocations
...

gdalle · June 1, 2023, 5:03am

Do you always have the same X? or the same y? Do they at least keep the same dimensions?

nilshg · June 1, 2023, 5:38am

This discussion was had before, much of it is likely unchanged:

I agree with Peter, if you are fitting models with high dimensional fixed effects look at FixedEffectModels which is designed for this and also has GPU support.

Ajay_Shah · June 2, 2023, 6:50am

The R world has a nice separation between a small lm.fit() which is fast and the full lm() which does more things. We should think similarly.

Mousum Dutta has been building a QR-decomposition based engine for GLM.jl which is more precise. Here also, we see the tradeoffs between a fast and precise method. It’s good to have both in the implementation and in the function design.

Leticia-maria · June 3, 2023, 11:15pm

They have the same dimension: I have posted a more detailed code here: Error when processing data from file for linear regression

Leticia-maria · June 3, 2023, 11:15pm

That would be great, we would have more control

Topic		Replies	Views
Optimizing Calculation in Julia compared to C (New to Julia) Performance	25	2500	January 1, 2020
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5043	November 26, 2018
Speed up column looping in matrix Performance	6	740	July 26, 2021
Iteration over 10 columns to calculate LinearRegression Machine Learning	4	367	March 27, 2021
Simulation of regression coefficients New to Julia first-steps , dataframes	7	559	June 29, 2021

Accelerating linear methods

Related topics