I’ve been trying to get Enzyme.jl working as an AD backend for SymbolicRegression.jl for a couple of years now, and recently got a working demonstration with a hack to increase the stack size: Extremely long compilation time on recursive, branching functions (DynamicExpressions.jl) · Issue #1156 · EnzymeAD/Enzyme.jl · GitHub.
Some background: I noticed it was hard to reproduce my issues in a MWE, and I also ran into a mysterious stack overflow once in a while with no debug info. It seemed I could only get the compilation freezes in the full, more complicated version of my code.
So in a sort of a last ditch effort to fix this issue I found this stackoverflow.com post about manually reserving stack space:
function with_stacksize(f::F, n) where {F<:Function}
fetch(schedule(Task(f, n)))
end
with_stacksize(8 * 1024 * 1024) do # 8 MB stack (default=4 MB)
...
end
Weirdly enough, this fixed the issue. I wrapped my Enzyme.jl call with it (full code here):
with_stacksize(8 * 1024 * 1024) do
autodiff(
Reverse,
evaluator,
Duplicated(g.f.tree, g.extra.storage_tree),
Duplicated(g.f.dataset, g.extra.storage_dataset),
Const(g.f.options),
Const(g.f.idx),
Duplicated(output, doutput),
)
end
and, lo and behold, I could actually run SymbolicRegression.jl with Enzyme.jl as the AD. Even multi-threaded searches now works. And the derivatives are fast
So, some questions:
- What could the issue be coming from? Is this simply too hard of a problem to differentiate with Enzyme using the normal Julia stack size?
Task(f, n)
is undocumented. Is it safe to use? (Am I going to run into some mysterious segfaults in the future?)- What is the “proper” way to deal with a limited stack size in Julia?