Experiments with Julia Stack Sizes and Enzyme

MilesCranmer · July 2, 2024, 10:00am

I’ve been trying to get Enzyme.jl working as an AD backend for SymbolicRegression.jl for a couple of years now, and recently got a working demonstration with a hack to increase the stack size: Extremely long compilation time on recursive, branching functions (DynamicExpressions.jl) · Issue #1156 · EnzymeAD/Enzyme.jl · GitHub.

Some background: I noticed it was hard to reproduce my issues in a MWE, and I also ran into a mysterious stack overflow once in a while with no debug info. It seemed I could only get the compilation freezes in the full, more complicated version of my code.

So in a sort of a last ditch effort to fix this issue I found this stackoverflow.com post about manually reserving stack space:

function with_stacksize(f::F, n) where {F<:Function}
    fetch(schedule(Task(f, n)))
end

with_stacksize(8 * 1024 * 1024) do  # 8 MB stack (default=4 MB)
    ...
end

Weirdly enough, this fixed the issue. I wrapped my Enzyme.jl call with it (full code here):

with_stacksize(8 * 1024 * 1024) do
    autodiff(
        Reverse,
        evaluator,
        Duplicated(g.f.tree, g.extra.storage_tree),
        Duplicated(g.f.dataset, g.extra.storage_dataset),
        Const(g.f.options),
        Const(g.f.idx),
        Duplicated(output, doutput),
    )
end

and, lo and behold, I could actually run SymbolicRegression.jl with Enzyme.jl as the AD. Even multi-threaded searches now works. And the derivatives are fast

So, some questions:

What could the issue be coming from? Is this simply too hard of a problem to differentiate with Enzyme using the normal Julia stack size?
Task(f, n) is undocumented. Is it safe to use? (Am I going to run into some mysterious segfaults in the future?)
What is the “proper” way to deal with a limited stack size in Julia?

MilesCranmer · July 3, 2024, 12:48am

Some experiments with stack sizes:

function test_recursion_depth(depth::Int, args...)
    depth >= 1000000000000 && return nothing
    print("\33[2K\rHello from depth $(depth)")
    test_recursion_depth(depth + 1, args...)
    print("$(depth) $(args)")  # Just to prevent LLVM removing args
end

So with no arguments, we can get up to a call depth of 86660:

julia> test_recursion_depth(0)
Hello from depth 86660ERROR: StackOverflowError:

Let’s see if we have 10 arguments:

julia> test_recursion_depth(0, ones(10)...)
Hello from depth 12998

meaning we hit a stack overflow at about 13,000 call depth.

And, for 100 arguments:

julia> test_recursion_depth(0, ones(100)...)
Hello from depth 6117

Does the type of the argument affect the stack frame size?

julia> test_recursion_depth(0, ones(BigInt, 100)...)
Hello from depth 6117

julia> test_recursion_depth(0, ntuple(_ -> ones(10), 100)...)
Hello from depth 6117

Nope, just the number of arguments. What about kwargs?

julia> function test_recursion_depth_kws(depth::Int; kws...)
           depth >= 1000000000000 && return nothing
           print("\33[2K\rHello from depth $(depth)")
           test_recursion_depth_kws(depth + 1; kws...)
           print("$(depth) $(kws)")
       end

julia> test_recursion_depth_kws(0; NamedTuple(Symbol("x$i") => 1 for i in 1:100)...)
Hello from depth 2511

The all-keywords example seems to have a ~2.4x larger stack frame size.

What about when you have local arguments within the function body?

julia> function test_recursion_depth_locals(::Val{n}, depth::Int, args...) where {n}
           depth >= 1000000000000 && return nothing
           print("\33[2K\rHello from depth $(depth)")
           a = ntuple(_ -> rand(), Val(n))
           test_recursion_depth_locals(Val(n), depth + 1, args...)
           print("$(depth) $(a) $(args)")  # Just to prevent LLVM removing args
       end;

Let’s see 1 value, 50 values, and 500 values in a tuple, in the function body:

julia> test_recursion_depth_locals(Val(1), 0, ones(10)...)
Hello from depth 12091

julia> test_recursion_depth_locals(Val(50), 0, ones(10)...)
Hello from depth 7646

julia> test_recursion_depth_locals(Val(500), 0, ones(10)...)
Hello from depth 1710

So the function body seems to affect the stack frame size quite a bit.

Okay, now lets try with different stack sizes via Task(f, n) (still undocumented, but seems to work pretty well) –

1 MB

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...), 1024 * 1024)))
Hello from depth 1532

4 MB (default for a task)

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...), 4 * 1024 * 1024)))
Hello from depth 6447

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...))))
Hello from depth 6447

8 MB (seems to be default for root task)

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...), 8 * 1024 * 1024)))
Hello from depth 13001

Note: it can get up a bit higher than the initial test – because the task has a fresh stack, whereas the root task has the REPL in its stack.

Now, 64 MB:

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...), 64 * 1024 * 1024)))
Hello from depth 104751

Or 1 GB:

julia> fetch(schedule(Task(() -> test_recursion_depth(0, ones(10)...), 1024 * 1024 * 1024)))
Hello from depth 1677615

(Note: it’s just a stack size – that memory is virtual. Only if you use that stack size do you actually allocate it.)

Topic		Replies	Views
Very long stack trace Performance	15	515	December 23, 2022
StaticTools Performance staticcompiler , statictoolsjl	2	304	April 3, 2024
Autodiff of vector inputs with Enzyme.jl (and possibly Optimization.jl) Optimization (Mathematical) question , optimization , sciml , enzyme	9	1062	August 16, 2023
[ANN] SymbolicRegression.jl 1.0.0 - Distributed High-Performance Symbolic Regression in Julia Package Announcements package , symbolic-regression	24	1498	November 29, 2024
Stack overflow in type inference... julia 1.9.0-DEV General Usage bug , stackoverflow	7	1667	May 14, 2022

Experiments with Julia Stack Sizes and Enzyme

Related topics