Nesting the GLM.jl in the objective function when formulating the ReverseDiff gradient generates StackOverflowError

Hi, in formulating my objective function for optimization, in one of the steps, the parameters will enter a Poisson regression as the weight, which I implemented through GLM.jl package. However, when generating the gradient from the ReverseDiff.jl package, it has the following error

ERROR: StackOverflowError:
Stacktrace:
[1] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
@ GLM ~/.julia/packages/GLM/6ycOV/src/glmfit.jl:67
[2] GLM.GlmResp(y::Vector{Float64}, d::Poisson{Float64}, l::LogLink, off::Vector{Float64}, wts::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
@ GLM ~/.julia/packages/GLM/6ycOV/src/glmfit.jl:69

The MWE is attached here:

using DataFrames, GLM, Optim, ReverseDiff, Random, Distributions

Random.seed!(12345)
count_data = DataFrame(Y = rand(Poisson(5), 1000), X1 = rand(1:10, 1000), X2 = rand(LogNormal(3,2), 1000), X3 = randn(1000), X4 = rand(Normal(0,2), 1000), w = rand(1000))
function coef_collect(data, para)
        data[:,:w_para] = data[:,:w] .*para[3]
        count_coef = collect(coef(glm(@formula(Y~X1+X2+X3), data, Poisson(); wts=data[:,:w_para])))
end

function predict_data(data, para)
        pred = Array(data[:,1:4])*coef_collect(data, para)
end


function optimize_func(para, y, data)
        mse = mean((y.-predict_data(data, para)*para[1]+data[:,:X4].*para[2]).^2)   
end

para_true = [3.0, 4.0, 1.5]
y_true = predict_data(count_data)*para_true[1]+count_data[:,:X4]*para_true[2] + randn(1000)

para_0 = [5.0,6.0, 1.3]

f_tape = ReverseDiff.GradientTape(para->optimize_func(para, y_true, count_data), para_0)

Thanks for your help!

I assume that this means that ReverseDiff cannot automatically differentiate through the GLM package, but others might have a suggestion. I haven’t used GLM or ReverseDiff very much.

(I’ve moved this from the optimization to the statistics category, since more people there might have a suggestion.)

Hi @odow , thanks for your help and for moving the question to the appropriate topic! I’m guessing that’s the reason too, the incompatibility between two packages.

I also tried to write my own Poisson regression using Optim.jl, but seems that there is a large amount of compilation time. Not sure why as well…Here’s my code:

function poisson_ll(y, X, ω, para)
        ll =  -sum(ω.*(y.*(X*para) .- exp.(X*para)))
        return ll
end

function poisson_g2!(G, para, y, X, ω)
        G[1] =  -sum(view(X,:, 1) .* (ω .*(y .- exp.(X*para))))
        G[2] =  -sum(view(X,:, 2) .* (ω .*(y .- exp.(X*para))))
        G[3] =  -sum(view(X,:, 3) .* (ω .*(y .- exp.(X*para))))
        G[4] =  -sum(view(X,:, 4) .* (ω .*(y .- exp.(X*para))))
end

Optim.minimizer(optimize(para->poisson_ll(y, data, ω, para), (G,para)->poisson_g2!(G, para, y, data, ω), zeros(size(data, 2)), LBFGS(;linesearch = LineSearches.BackTracking(order=3)), Optim.Options(store_trace=false, extended_trace=false, show_trace = false, iterations=1000000, x_tol=1e-15)))

After compiling the code, there is still a large amount of time spent on compilation in the following runs. Is this normal for Optim.jl?

  0.053325 seconds (3.01 k allocations: 22.129 MiB, 37.59% gc time, 15.28% compilation time)

Hi!

First compilation overhead is normal.

I don’t know about correctness of code. But if you will use @. macro - you can get great performance improve.

I think your code:

-sum(ω.*(y.*(X*para) .- exp.(X*para)))

have large unnecessary allocations.

try to do something like this:

beta = X*para
-sum(@. ω*(y*beta  - exp(beta)))

Hi @PharmCat , thanks for the allocation suggestion. The time and allocation results are not from the first compilation but after the first compilation. Not sure why what happened…

I tried what you suggested, but sill have a substantial compilation time.

  0.028127 seconds (2.76 k allocations: 17.967 MiB, 28.26% compilation time)

anonimous functions compiled

do that:

optf = para->poisson_ll(y, data, ω, para)
goptf = (G, para)->poisson_g2!(G, para, y, data, ω)

@time  Optim.minimizer(optimize(optf, goptf, zeros(size(data, 2)), LBFGS(;linesearch = LineSearches.BackTracking(order=3)), Optim.Options(store_trace=false, extended_trace=false, show_trace = false, iterations=1000000, x_tol=1e-15)))