Laplacian regularisation for GLMs

grero · November 27, 2025, 1:42am

I would like to use GLM to approximate an integer variable as following a Poisson distribution where the rate parameter is determined by a the independent variable X. X lives in a non-Euclidean space, and so using vanilla GLM tends to not provide a very good fit. I was thinking of making use of laplacian regularisation to improve the fit. My understanding is that I can add a regularisation term given by \alpha X'\beta L \beta' X where \beta is a vector of regression coefficients, L is the laplacian of my non-euclidean manifold and \alpha is regularisation cost. Is there a package that can do this? My understanding is that GLM.jl does not support regularisation. I also looked at MLJLinearModels.jl, but I couldn’t figure out how to add custom regularisation.

grero · December 2, 2025, 9:13am

I basically got something working by doing the following:

using ReverseDiff
using ADTypes
using Optim
using Random
using StatsBase
using SpecialFunctions

function lossfunc2(β::AbstractVector{<:Real}, X::AbstractMatrix{<:Real}, y::AbstractVector{<:Real},L::AbstractMatrix{<:Real}, α::Real)
    η = X'*β
    λ = exp.(η)
    ll1 = mean(-y.*η + loggamma.(y .+ 1) .+  λ)
    ll2 =α*β'*L*β
    ll1 + ll2
end

function fit_glm_2(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
    d,n = size(X)
    if β0 === nothing
        β0 = randn(T,d+1)
    elseif size(β0,1) == size(X,1)
        push!(β0, randn(T))
    end
    L2 = zeros(T, size(L,1)+1, size(L,2)+1)
    L2[1:size(L,1), 1:size(L,2)] .= L
    Xq = [X;ones(T, 1, n)]
    lf(β) = lossfunc2(β, Xq, y,L2,α)
    q = optimize(lf, β0, LBFGS(), Optim.Options(;show_trace=true);autodiff=AutoReverseDiff()) 
end

function test()
    Xe = randn(300,1000)
    xe = rand(1:5, 1000)
    L = randn(300,300)
    fit_glm_2(Xe, xe, L;α=0.01)
end

test()

I would like to speed it up, though, since I’m looking at fitting at least a few hundred models with a much larger parameter space. Does anyone know how to do the same fit using Reactant with Enzyme?

gdalle · December 2, 2025, 12:56pm

The right way is to use Reactant and Enzyme directly by compiling your entire optimization loop (ping @avikpal).

In the meantime I’m working on an easy way which requires minimal code changes but only compiles the gradient computation, so it will be slower than whatever Avik can offer you:

Install the development version of DifferentiationInterface.jl as follows:

using Pkg
Pkg.add(;
    url="https://github.com/JuliaDiff/DifferentiationInterface.jl",
    subdir="DifferentiationInterface",
    rev="gd/reac",
)

Replace AutoReverseDiff()^[1] with AutoReactant(AutoEnzyme(; function_annotation = Enzyme.Const))
Relax your type constraints <:Real into <:Number so that Reactant numbers are accepted.
Enjoy.

Right now this is still experimental, and it seems not to be worth it on your small problem (the Reactant compilation takes up most of the time, especially because DI needs to duplicate it x4). I assume that if you run your optimization longer or on much larger problems, it will become relevant. But ideally you’d mutualize Reactant compilation across solves, and I don’t think Optim.jl allows you to do that.

You would also get a sizeable benefit by just using AutoReverseDiff(; compile=true) since your code has no control flow. ↩︎

wsmoses · December 2, 2025, 3:54pm

Optimization.jl iirc lets you pass in a custom gradient function. You can get the best of all worlds by using Reactant to compile the gradient once, and then it can be re-used for as many models as you like.

Though longer term Optimization.jl ought add first-class reactant support so it can fuse the actual parameter update with the gradient (which will not be available in DI).

wsmoses · December 2, 2025, 4:03pm

even more longer term, Reactant out be able to compile the entire fit_glm_2 function, which you can use AutoEnzyme within (though I would hazard a guess that this doesn’t work at the moment and requires some work from the Optimization.jl folks).

wsmoses · December 2, 2025, 4:30pm

long term it owuld be something like:

using Enzyme
using Reactant
using ADTypes
using Optim
using Random
using StatsBase
using SpecialFunctions

function lossfunc2(β::AbstractVector, X::AbstractMatrix, y::AbstractVector,L::AbstractMatrix, α)
    η = X'*β
    λ = exp.(η)
    ll1 = mean(-y.*η + loggamma.(y .+ 1) .+  λ)
    ll2 =α*β'*L*β
    ll1 + ll2
end

function fit_glm_2(X::AbstractMatrix, y::AbstractVector, L::AbstractMatrix;β0=nothing,α=1.0)
    d,n = size(X)
    T = eltype(X)
    if β0 === nothing
        β0 = randn(T,d+1)
    elseif size(β0,1) == size(X,1)
        push!(β0, randn(T))
    end
    L2 = zeros(T, size(L,1)+1, size(L,2)+1)
    L2[1:size(L,1), 1:size(L,2)] .= L
    Xq = [X;ones(T, 1, n)]
    lf(β) = lossfunc2(β, Xq, y,L2,α)
    q = optimize(lf, β0, LBFGS(), Optim.Options(;show_trace=true);autodiff=AutoEnzyme()) 
end

function test()
    Xe = Reactant.to_rarray(randn(300,1000))
    xe = Reactant.to_rarray(rand(1:5, 1000))
    L = Reactant.to_rarray(randn(300,300))
    cfit_glm_2 = Reactant.@compile fit_glm_2(Xe, xe, L;α=0.01)
    cfit_glm_2(Xe, xe, L;α=0.01)
end

test()

but it looks like there’s a missing promote in optim:

 [3] promote_objtype                                                                                                                                                                         
    @ ~/.julia/packages/Optim/gmigl/src/multivariate/optimize/interface.jl:70 [inlined]

stevengj · December 2, 2025, 5:26pm

You can also just spend 5 minutes to work out the reverse-mode gradient by hand (also a good excuse to learn some matrix calculus). AD is a great tool, but writing a simple derivative like this yourself gives you a lot more control, especially if there are big matrices involved, and it will also help you understand what AD is doing.

grero · December 3, 2025, 1:59am

This is something that I in general try to do, actually. For this particular problem, finding the analytical gradient is not hard, but I’m also interested in how to solve more general problems using optimisation with Enzyme/Reactant.

grero · December 3, 2025, 2:02am

I gave this solution a try, but I ran into an an error

"""
Same as fit_glm_2 but using Optimization with defined gradient
"""
function fit_glm_3(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
    d,n = size(X)
    if β0 === nothing
        β0 = randn(T,d+1)
    elseif size(β0,1) == size(X,1)
        push!(β0, randn(T))
    end
    L2 = zeros(T, size(L,1)+1, size(L,2)+1)
    L2[1:size(L,1), 1:size(L,2)] .= L
    Xq = [X;ones(T, 1, n)]
    lf(β,p) = lossfunc2(β, p...)
    function enzyme_gradient(β0, (Xe, xe, L, α))
       return Enzyme.gradient(Enzyme.Reverse, Const(lossfunc2), β0, Const(Xe), Const(xe), Const(L), Const(α))[1]
    end
    qf = OptimizationFunction(lf;grad=enzyme_gradient)
    prob = OptimizationProblem(qf, β0, (Xq, y, L2, α))
    sol = solve(prob, LBFGS())
end

function test3()
    Xe = randn(300,1000)
    xe = rand(1:5, 1000)
    L = randn(300,300)
    fit_glm_3(Xe, xe, L;α=0.01)
end

julia> test3()
ERROR: MethodError: no method matching (::var"#enzyme_gradient#31")(::Vector{Float64}, ::Vector{Float64}, ::Tuple{Matrix{Float64}, Vector{Int64}, Matrix{Float64}, Float64})
The function `enzyme_gradient` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  (::var"#enzyme_gradient#31")(::Any, ::Any)
   @ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:47

Stacktrace:
  [1] (::OptimizationBase.var"#grad#204"{Tuple{…}, OptimizationFunction{…}})(G::Vector{Float64}, x::Vector{Float64})
    @ OptimizationBase ~/.julia/packages/OptimizationBase/77acV/src/function.jl:121
  [2] (::OptimizationOptimJL.var"#8#14"{OptimizationCache{…}, OptimizationOptimJL.var"#7#13"{…}})(G::Vector{Float64}, θ::Vector{Float64})
    @ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/2AQnq/src/OptimizationOptimJL.jl:175
  [3] value_gradient!!(obj::TwiceDifferentiable{Float64, Vector{Float64}, Matrix{Float64}, Vector{Float64}}, x::Vector{Float64})
    @ NLSolversBase ~/.julia/packages/NLSolversBase/n7XXO/src/interface.jl:82
  [4] initial_state(method::LBFGS{…}, options::Optim.Options{…}, d::TwiceDifferentiable{…}, initial_x::Vector{…})
    @ Optim ~/.julia/packages/Optim/gmigl/src/multivariate/solvers/first_order/l_bfgs.jl:168
  [5] optimize(d::TwiceDifferentiable{…}, initial_x::Vector{…}, method::LBFGS{…}, options::Optim.Options{…})
    @ Optim ~/.julia/packages/Optim/gmigl/src/multivariate/optimize/optimize.jl:43
  [6] __solve(cache::OptimizationCache{OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, LBFGS{…}, Bool, OptimizationOptimJL.var"#3#5", Nothing})
    @ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/2AQnq/src/OptimizationOptimJL.jl:227
  [7] solve!
    @ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:234 [inlined]
  [8] #solve#741
    @ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:131 [inlined]
  [9] solve
    @ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:128 [inlined]
 [10] fit_glm_3(X::Matrix{Float64}, y::Vector{Int64}, L::Matrix{Float64}; β0::Nothing, α::Float64)
    @ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:52
 [11] fit_glm_3
    @ ~/Documents/programming/julia/Hippocampus/src/test3.jl:36 [inlined]
 [12] test3()
    @ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:66
 [13] top-level scope
    @ REPL[62]:1
Some type information was truncated. Use `show(err)` to see complete types.

grero · December 4, 2025, 12:51am

Sorry, my fault for not reading the manual properly. The gradient function is assumed to be inplace by default, so the first argument should be a vector to hold the gradient values. This works

"""
Same as fit_glm_2 but using Optimization with defined gradient
"""
function fit_glm_3(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
    d,n = size(X)
    if β0 === nothing
        β0 = randn(T,d+1)
    elseif size(β0,1) == size(X,1)
        push!(β0, randn(T))
    end
    L2 = zeros(T, size(L,1)+1, size(L,2)+1)
    L2[1:size(L,1), 1:size(L,2)] .= L
    Xq = [X;ones(T, 1, n)]
    lf(β,p) = lossfunc2(β, p...)
    function enzyme_gradient!(G, β0, (Xe, xe, L, α))
       Enzyme.autodiff(Enzyme.Reverse, Const(lossfunc2), Duplicated(β0,G), Const(Xe), Const(xe), Const(L), Const(α))
    end
    qf = OptimizationFunction(lf;grad=enzyme_gradient!)
    prob = OptimizationProblem(qf, β0, (Xq, y, L2, α))
    sol = solve(prob, LBFGS())
end

function test3()
    Xe = randn(300,1000)
    xe = rand(1:5, 1000)
    L = randn(300,300)
    fit_glm_3(Xe, xe, L;α=0.01)
end

grero · December 4, 2025, 2:59am

Hm, looks like I spoke too soon. LBFGS() returned a failure code without throwing any errors. If I switch to OptimizationOptimisers.Adam, it looks like the object function is no longer called with parameters?
I simplified my example to using the Rosenbrock function from the Optimisation.jl document, but using Enzyme to estimate the gradients

rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2

function solve_rosenbrock()
    x0 = zeros(2)
    _p = [1.0, 100.0]
    function enzyme_gradient!(G, x0, p)
        Enzyme.autodiff(Enzyme.Reverse, Const(rosenbrock), Duplicated(x0,G), Const(p))
    end
    optf = SciMLBase.OptimizationFunction(rosenbrock;grad=enzyme_gradient!)
    prob = SciMLBase.OptimizationProblem(optf, x0, _p)
    sol = solve(prob, OptimizationOptimisers.Adam(0.05), maxiters = 1000, progress = false)
end

julia> sol = solve_rosenbrock()
ERROR: MethodError: no method matching rosenbrock(::Vector{Float64})
The function `rosenbrock` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  rosenbrock(::Any, ::Any)
   @ Main ~/Documents/programming/julia/Hippocampus/src/test4.jl:6

Stacktrace:
 [1] (::OptimizationFunction{…})(args::Vector{…})
   @ SciMLBase ~/.julia/packages/SciMLBase/hHrRi/src/scimlfunctions.jl:4261
 [2] __solve(cache::OptimizationCache{Optimisers.Adam{…}, true, OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, Bool, OptimizationOptimisers.var"#3#5", Nothing})
   @ OptimizationOptimisers ~/.julia/packages/OptimizationOptimisers/atr4L/src/OptimizationOptimisers.jl:89
 [3] solve!(cache::OptimizationCache{Optimisers.Adam{…}, true, OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, Bool, OptimizationOptimisers.var"#3#5", Nothing})
   @ OptimizationBase ~/.julia/packages/OptimizationBase/R2xIG/src/solve.jl:205
 [4] solve(::OptimizationProblem{…}, ::Optimisers.Adam{…}; kwargs::@Kwargs{…})
   @ OptimizationBase ~/.julia/packages/OptimizationBase/R2xIG/src/solve.jl:96
 [5] solve_rosenbrock()
   @ Main ~/Documents/programming/julia/Hippocampus/src/test4.jl:16
 [6] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

Topic		Replies	Views
Lux + Enzyme naively applying model seems to cause runtime activity error General Usage enzyme , lux	23	477	August 25, 2025
Nesting the GLM.jl in the objective function when formulating the ReverseDiff gradient generates StackOverflowError Statistics glm , reversediff	5	300	December 4, 2023
How to use LBFGS to minimise the cost function for regularised logistic regression Optimization (Mathematical)	7	1249	February 2, 2020
[ANN] Generalized Linear Regression package Package Announcements regression	4	2402	January 28, 2022
How to write an objective function that has both a LASSO and a Ridge regularization term in JuMP Optimization (Mathematical)	30	2382	December 6, 2020

Laplacian regularisation for GLMs

Related topics