I would like to use GLM to approximate an integer variable as following a Poisson distribution where the rate parameter is determined by a the independent variable X. X lives in a non-Euclidean space, and so using vanilla GLM tends to not provide a very good fit. I was thinking of making use of laplacian regularisation to improve the fit. My understanding is that I can add a regularisation term given by \alpha X'\beta L \beta' X where \beta is a vector of regression coefficients, L is the laplacian of my non-euclidean manifold and \alpha is regularisation cost. Is there a package that can do this? My understanding is that GLM.jl does not support regularisation. I also looked at MLJLinearModels.jl, but I couldn’t figure out how to add custom regularisation.
I basically got something working by doing the following:
using ReverseDiff
using ADTypes
using Optim
using Random
using StatsBase
using SpecialFunctions
function lossfunc2(β::AbstractVector{<:Real}, X::AbstractMatrix{<:Real}, y::AbstractVector{<:Real},L::AbstractMatrix{<:Real}, α::Real)
η = X'*β
λ = exp.(η)
ll1 = mean(-y.*η + loggamma.(y .+ 1) .+ λ)
ll2 =α*β'*L*β
ll1 + ll2
end
function fit_glm_2(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
d,n = size(X)
if β0 === nothing
β0 = randn(T,d+1)
elseif size(β0,1) == size(X,1)
push!(β0, randn(T))
end
L2 = zeros(T, size(L,1)+1, size(L,2)+1)
L2[1:size(L,1), 1:size(L,2)] .= L
Xq = [X;ones(T, 1, n)]
lf(β) = lossfunc2(β, Xq, y,L2,α)
q = optimize(lf, β0, LBFGS(), Optim.Options(;show_trace=true);autodiff=AutoReverseDiff())
end
function test()
Xe = randn(300,1000)
xe = rand(1:5, 1000)
L = randn(300,300)
fit_glm_2(Xe, xe, L;α=0.01)
end
test()
I would like to speed it up, though, since I’m looking at fitting at least a few hundred models with a much larger parameter space. Does anyone know how to do the same fit using Reactant with Enzyme?
The right way is to use Reactant and Enzyme directly by compiling your entire optimization loop (ping @avikpal).
In the meantime I’m working on an easy way which requires minimal code changes but only compiles the gradient computation, so it will be slower than whatever Avik can offer you:
- Install the development version of DifferentiationInterface.jl as follows:
using Pkg
Pkg.add(;
url="https://github.com/JuliaDiff/DifferentiationInterface.jl",
subdir="DifferentiationInterface",
rev="gd/reac",
)
- Replace
AutoReverseDiff()[1] withAutoReactant(AutoEnzyme(; function_annotation = Enzyme.Const)) - Relax your type constraints
<:Realinto<:Numberso that Reactant numbers are accepted. - Enjoy.
Right now this is still experimental, and it seems not to be worth it on your small problem (the Reactant compilation takes up most of the time, especially because DI needs to duplicate it x4). I assume that if you run your optimization longer or on much larger problems, it will become relevant. But ideally you’d mutualize Reactant compilation across solves, and I don’t think Optim.jl allows you to do that.
You would also get a sizeable benefit by just using
AutoReverseDiff(; compile=true)since your code has no control flow. ↩︎
Optimization.jl iirc lets you pass in a custom gradient function. You can get the best of all worlds by using Reactant to compile the gradient once, and then it can be re-used for as many models as you like.
Though longer term Optimization.jl ought add first-class reactant support so it can fuse the actual parameter update with the gradient (which will not be available in DI).
even more longer term, Reactant out be able to compile the entire fit_glm_2 function, which you can use AutoEnzyme within (though I would hazard a guess that this doesn’t work at the moment and requires some work from the Optimization.jl folks).
long term it owuld be something like:
using Enzyme
using Reactant
using ADTypes
using Optim
using Random
using StatsBase
using SpecialFunctions
function lossfunc2(β::AbstractVector, X::AbstractMatrix, y::AbstractVector,L::AbstractMatrix, α)
η = X'*β
λ = exp.(η)
ll1 = mean(-y.*η + loggamma.(y .+ 1) .+ λ)
ll2 =α*β'*L*β
ll1 + ll2
end
function fit_glm_2(X::AbstractMatrix, y::AbstractVector, L::AbstractMatrix;β0=nothing,α=1.0)
d,n = size(X)
T = eltype(X)
if β0 === nothing
β0 = randn(T,d+1)
elseif size(β0,1) == size(X,1)
push!(β0, randn(T))
end
L2 = zeros(T, size(L,1)+1, size(L,2)+1)
L2[1:size(L,1), 1:size(L,2)] .= L
Xq = [X;ones(T, 1, n)]
lf(β) = lossfunc2(β, Xq, y,L2,α)
q = optimize(lf, β0, LBFGS(), Optim.Options(;show_trace=true);autodiff=AutoEnzyme())
end
function test()
Xe = Reactant.to_rarray(randn(300,1000))
xe = Reactant.to_rarray(rand(1:5, 1000))
L = Reactant.to_rarray(randn(300,300))
cfit_glm_2 = Reactant.@compile fit_glm_2(Xe, xe, L;α=0.01)
cfit_glm_2(Xe, xe, L;α=0.01)
end
test()
but it looks like there’s a missing promote in optim:
[3] promote_objtype
@ ~/.julia/packages/Optim/gmigl/src/multivariate/optimize/interface.jl:70 [inlined]
You can also just spend 5 minutes to work out the reverse-mode gradient by hand (also a good excuse to learn some matrix calculus). AD is a great tool, but writing a simple derivative like this yourself gives you a lot more control, especially if there are big matrices involved, and it will also help you understand what AD is doing.
This is something that I in general try to do, actually. For this particular problem, finding the analytical gradient is not hard, but I’m also interested in how to solve more general problems using optimisation with Enzyme/Reactant.
I gave this solution a try, but I ran into an an error
"""
Same as fit_glm_2 but using Optimization with defined gradient
"""
function fit_glm_3(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
d,n = size(X)
if β0 === nothing
β0 = randn(T,d+1)
elseif size(β0,1) == size(X,1)
push!(β0, randn(T))
end
L2 = zeros(T, size(L,1)+1, size(L,2)+1)
L2[1:size(L,1), 1:size(L,2)] .= L
Xq = [X;ones(T, 1, n)]
lf(β,p) = lossfunc2(β, p...)
function enzyme_gradient(β0, (Xe, xe, L, α))
return Enzyme.gradient(Enzyme.Reverse, Const(lossfunc2), β0, Const(Xe), Const(xe), Const(L), Const(α))[1]
end
qf = OptimizationFunction(lf;grad=enzyme_gradient)
prob = OptimizationProblem(qf, β0, (Xq, y, L2, α))
sol = solve(prob, LBFGS())
end
function test3()
Xe = randn(300,1000)
xe = rand(1:5, 1000)
L = randn(300,300)
fit_glm_3(Xe, xe, L;α=0.01)
end
julia> test3()
ERROR: MethodError: no method matching (::var"#enzyme_gradient#31")(::Vector{Float64}, ::Vector{Float64}, ::Tuple{Matrix{Float64}, Vector{Int64}, Matrix{Float64}, Float64})
The function `enzyme_gradient` exists, but no method is defined for this combination of argument types.
Closest candidates are:
(::var"#enzyme_gradient#31")(::Any, ::Any)
@ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:47
Stacktrace:
[1] (::OptimizationBase.var"#grad#204"{Tuple{…}, OptimizationFunction{…}})(G::Vector{Float64}, x::Vector{Float64})
@ OptimizationBase ~/.julia/packages/OptimizationBase/77acV/src/function.jl:121
[2] (::OptimizationOptimJL.var"#8#14"{OptimizationCache{…}, OptimizationOptimJL.var"#7#13"{…}})(G::Vector{Float64}, θ::Vector{Float64})
@ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/2AQnq/src/OptimizationOptimJL.jl:175
[3] value_gradient!!(obj::TwiceDifferentiable{Float64, Vector{Float64}, Matrix{Float64}, Vector{Float64}}, x::Vector{Float64})
@ NLSolversBase ~/.julia/packages/NLSolversBase/n7XXO/src/interface.jl:82
[4] initial_state(method::LBFGS{…}, options::Optim.Options{…}, d::TwiceDifferentiable{…}, initial_x::Vector{…})
@ Optim ~/.julia/packages/Optim/gmigl/src/multivariate/solvers/first_order/l_bfgs.jl:168
[5] optimize(d::TwiceDifferentiable{…}, initial_x::Vector{…}, method::LBFGS{…}, options::Optim.Options{…})
@ Optim ~/.julia/packages/Optim/gmigl/src/multivariate/optimize/optimize.jl:43
[6] __solve(cache::OptimizationCache{OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, LBFGS{…}, Bool, OptimizationOptimJL.var"#3#5", Nothing})
@ OptimizationOptimJL ~/.julia/packages/OptimizationOptimJL/2AQnq/src/OptimizationOptimJL.jl:227
[7] solve!
@ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:234 [inlined]
[8] #solve#741
@ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:131 [inlined]
[9] solve
@ ~/.julia/packages/SciMLBase/JxpZc/src/solve.jl:128 [inlined]
[10] fit_glm_3(X::Matrix{Float64}, y::Vector{Int64}, L::Matrix{Float64}; β0::Nothing, α::Float64)
@ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:52
[11] fit_glm_3
@ ~/Documents/programming/julia/Hippocampus/src/test3.jl:36 [inlined]
[12] test3()
@ Main ~/Documents/programming/julia/Hippocampus/src/test3.jl:66
[13] top-level scope
@ REPL[62]:1
Some type information was truncated. Use `show(err)` to see complete types.
Sorry, my fault for not reading the manual properly. The gradient function is assumed to be inplace by default, so the first argument should be a vector to hold the gradient values. This works
"""
Same as fit_glm_2 but using Optimization with defined gradient
"""
function fit_glm_3(X::AbstractMatrix{T}, y::AbstractVector{<:Integer}, L::Matrix{<:Real};β0::Union{Nothing,Vector{T}}=nothing,α::T=one(T)) where T <: Real
d,n = size(X)
if β0 === nothing
β0 = randn(T,d+1)
elseif size(β0,1) == size(X,1)
push!(β0, randn(T))
end
L2 = zeros(T, size(L,1)+1, size(L,2)+1)
L2[1:size(L,1), 1:size(L,2)] .= L
Xq = [X;ones(T, 1, n)]
lf(β,p) = lossfunc2(β, p...)
function enzyme_gradient!(G, β0, (Xe, xe, L, α))
Enzyme.autodiff(Enzyme.Reverse, Const(lossfunc2), Duplicated(β0,G), Const(Xe), Const(xe), Const(L), Const(α))
end
qf = OptimizationFunction(lf;grad=enzyme_gradient!)
prob = OptimizationProblem(qf, β0, (Xq, y, L2, α))
sol = solve(prob, LBFGS())
end
function test3()
Xe = randn(300,1000)
xe = rand(1:5, 1000)
L = randn(300,300)
fit_glm_3(Xe, xe, L;α=0.01)
end
Hm, looks like I spoke too soon. LBFGS() returned a failure code without throwing any errors. If I switch to OptimizationOptimisers.Adam, it looks like the object function is no longer called with parameters?
I simplified my example to using the Rosenbrock function from the Optimisation.jl document, but using Enzyme to estimate the gradients
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
function solve_rosenbrock()
x0 = zeros(2)
_p = [1.0, 100.0]
function enzyme_gradient!(G, x0, p)
Enzyme.autodiff(Enzyme.Reverse, Const(rosenbrock), Duplicated(x0,G), Const(p))
end
optf = SciMLBase.OptimizationFunction(rosenbrock;grad=enzyme_gradient!)
prob = SciMLBase.OptimizationProblem(optf, x0, _p)
sol = solve(prob, OptimizationOptimisers.Adam(0.05), maxiters = 1000, progress = false)
end
julia> sol = solve_rosenbrock()
ERROR: MethodError: no method matching rosenbrock(::Vector{Float64})
The function `rosenbrock` exists, but no method is defined for this combination of argument types.
Closest candidates are:
rosenbrock(::Any, ::Any)
@ Main ~/Documents/programming/julia/Hippocampus/src/test4.jl:6
Stacktrace:
[1] (::OptimizationFunction{…})(args::Vector{…})
@ SciMLBase ~/.julia/packages/SciMLBase/hHrRi/src/scimlfunctions.jl:4261
[2] __solve(cache::OptimizationCache{Optimisers.Adam{…}, true, OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, Bool, OptimizationOptimisers.var"#3#5", Nothing})
@ OptimizationOptimisers ~/.julia/packages/OptimizationOptimisers/atr4L/src/OptimizationOptimisers.jl:89
[3] solve!(cache::OptimizationCache{Optimisers.Adam{…}, true, OptimizationFunction{…}, OptimizationBase.ReInitCache{…}, Nothing, Nothing, Nothing, Nothing, Nothing, Bool, OptimizationOptimisers.var"#3#5", Nothing})
@ OptimizationBase ~/.julia/packages/OptimizationBase/R2xIG/src/solve.jl:205
[4] solve(::OptimizationProblem{…}, ::Optimisers.Adam{…}; kwargs::@Kwargs{…})
@ OptimizationBase ~/.julia/packages/OptimizationBase/R2xIG/src/solve.jl:96
[5] solve_rosenbrock()
@ Main ~/Documents/programming/julia/Hippocampus/src/test4.jl:16
[6] top-level scope
@ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.