Nesting the GLM.jl in the objective function when formulating the ReverseDiff gradient generates StackOverflowError

xxxxx · December 2, 2023, 5:51pm

Hi, in formulating my objective function for optimization, in one of the steps, the parameters will enter a Poisson regression as the weight, which I implemented through GLM.jl package. However, when generating the gradient from the ReverseDiff.jl package, it has the following error

ERROR: StackOverflowError:
Stacktrace:
[1] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
@ GLM ~/.julia/packages/GLM/6ycOV/src/glmfit.jl:67
[2] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
@ GLM ~/.julia/packages/GLM/6ycOV/src/glmfit.jl:69

The MWE is attached here:

using DataFrames, GLM, Optim, ReverseDiff, Random, Distributions

Random.seed!(12345)
count_data = DataFrame(Y = rand(Poisson(5), 1000), X1 = rand(1:10, 1000), X2 = rand(LogNormal(3,2), 1000), X3 = randn(1000), X4 = rand(Normal(0,2), 1000), w = rand(1000))
function coef_collect(data, para)
        data[:,:w_para] = data[:,:w] .*para[3]
        count_coef = collect(coef(glm(@formula(Y~X1+X2+X3), data, Poisson(); wts=data[:,:w_para])))
end

function predict_data(data, para)
        pred = Array(data[:,1:4])*coef_collect(data, para)
end


function optimize_func(para, y, data)
        mse = mean((y.-predict_data(data, para)*para[1]+data[:,:X4].*para[2]).^2)   
end

para_true = [3.0, 4.0, 1.5]
y_true = predict_data(count_data)*para_true[1]+count_data[:,:X4]*para_true[2] + randn(1000)

para_0 = [5.0,6.0, 1.3]

f_tape = ReverseDiff.GradientTape(para->optimize_func(para, y_true, count_data), para_0)

Thanks for your help!

odow · December 3, 2023, 7:25am

I assume that this means that ReverseDiff cannot automatically differentiate through the GLM package, but others might have a suggestion. I haven’t used GLM or ReverseDiff very much.

(I’ve moved this from the optimization to the statistics category, since more people there might have a suggestion.)

xxxxx · December 3, 2023, 7:12pm

Hi @odow , thanks for your help and for moving the question to the appropriate topic! I’m guessing that’s the reason too, the incompatibility between two packages.

I also tried to write my own Poisson regression using Optim.jl, but seems that there is a large amount of compilation time. Not sure why as well…Here’s my code:

function poisson_ll(y, X, ω, para)
        ll =  -sum(ω.*(y.*(X*para) .- exp.(X*para)))
        return ll
end

function poisson_g2!(G, para, y, X, ω)
        G[1] =  -sum(view(X,:, 1) .* (ω .*(y .- exp.(X*para))))
        G[2] =  -sum(view(X,:, 2) .* (ω .*(y .- exp.(X*para))))
        G[3] =  -sum(view(X,:, 3) .* (ω .*(y .- exp.(X*para))))
        G[4] =  -sum(view(X,:, 4) .* (ω .*(y .- exp.(X*para))))
end

Optim.minimizer(optimize(para->poisson_ll(y, data, ω, para), (G,para)->poisson_g2!(G, para, y, data, ω), zeros(size(data, 2)), LBFGS(;linesearch = LineSearches.BackTracking(order=3)), Optim.Options(store_trace=false, extended_trace=false, show_trace = false, iterations=1000000, x_tol=1e-15)))

After compiling the code, there is still a large amount of time spent on compilation in the following runs. Is this normal for Optim.jl?

  0.053325 seconds (3.01 k allocations: 22.129 MiB, 37.59% gc time, 15.28% compilation time)

PharmCat · December 4, 2023, 12:37pm

Hi!

First compilation overhead is normal.

I don’t know about correctness of code. But if you will use @. macro - you can get great performance improve.

I think your code:

-sum(ω.*(y.*(X*para) .- exp.(X*para)))

have large unnecessary allocations.

try to do something like this:

beta = X*para
-sum(@. ω*(y*beta  - exp(beta)))

xxxxx · December 4, 2023, 3:39pm

Hi @PharmCat , thanks for the allocation suggestion. The time and allocation results are not from the first compilation but after the first compilation. Not sure why what happened…

I tried what you suggested, but sill have a substantial compilation time.

  0.028127 seconds (2.76 k allocations: 17.967 MiB, 28.26% compilation time)

PharmCat · December 4, 2023, 8:01pm

anonimous functions compiled

do that:

optf = para->poisson_ll(y, data, ω, para)
goptf = (G, para)->poisson_g2!(G, para, y, data, ω)

@time  Optim.minimizer(optimize(optf, goptf, zeros(size(data, 2)), LBFGS(;linesearch = LineSearches.BackTracking(order=3)), Optim.Options(store_trace=false, extended_trace=false, show_trace = false, iterations=1000000, x_tol=1e-15)))

Topic		Replies	Views
StackOverflowError Optimization (Mathematical) jump	7	3347	December 17, 2017
StackOverflowError() while compiling a package with PackageCompiler.jl General Usage package-compiler	1	467	December 24, 2021
Rand! + MvNormal = StackOverflowError General Usage	0	359	December 21, 2021
ReverseDiff method error in multiplication General Usage reversediff , autodiff	1	270	December 21, 2022
`StackOverflowError` with Revise.jl General Usage	5	565	May 7, 2019

Nesting the GLM.jl in the objective function when formulating the ReverseDiff gradient generates StackOverflowError

Related topics