Compiler recursion limit

I’ve written some recursive code that compiles away due to (I believe) constant propagation. However, I’m afraid that if I run the code on a larger problem I will hit a compiler recursion limit and it will no longer compile away. Does anyone have a rule of thumb for how deep your code can recurse before compiler inference gives up and infers Any?

I know, I know, you probably want to see a MWE. I do have one, but it’s in a repo that is currently private, so I’d prefer not to share it at this time. If there is no precise number, like a JULIA_INFERENCE_RECURSION_LIMIT, a rule of thumb will do.

3 Likes

Maybe a very simple test case can help answer?

julia> using Test

julia> f(::Val{0}) = 0
f (generic function with 1 method)

julia> f(::Val{N}) where {N} = f(Val{N-1}())
f (generic function with 2 methods)

julia> @inferred f(Val(10))
0

julia> @inferred f(Val(100))
0

julia> @inferred f(Val(1000))
Internal error: stack overflow in type inference of f(Base.Val{1000}).
This might be caused by recursion over very long tuples or argument lists.
0

However I assume the compiler behavior is more complex, and typically depends on the actual function involved.

4 Likes

Ok, here’s my example. I wrote a function topo_sort that does a topological sort on a tuple of singleton types. There is a DAG relationship among the types, which is defined by overloads of a children function.

Here’s the implementation:

function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
    S === T || in_tuple(x, Base.tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false


function visit_node(order::Tuple, x)
    in_tuple(x, order) && return order
    new_order = visit_children(order, children(x))
    (x, new_order...)
end

function visit_children(order::Tuple, children::Tuple)
    new_order = visit_node(order, children[1])
    visit_children(new_order, Base.tail(children))
end

visit_children(order::Tuple, ::Tuple{}) = order


topo_sort(to_visit::Tuple) = _topo_sort((), to_visit)

function _topo_sort(order::Tuple, to_visit::Tuple)
    new_order = visit_node(order, to_visit[1])
    _topo_sort(new_order, Base.tail(to_visit))
end

_topo_sort(order::Tuple, unvisited::Tuple{}) = order

I’ve come up with a fairly simple example DAG that sometimes compiles away and sometimes does not, depending on the order that I provide the singleton types to the topo_sort function. Here’s the DAG:

struct A end
struct B end
struct C end
struct D end
struct E end
struct F end
struct G end

children(::A) = (B(), C())
children(::B) = (D(), )
children(::C) = (D(), )
children(::D) = (E(), )
children(::E) = (F(), )
children(::F) = (G(), )
children(::G) = ()

And here we can see that sometimes topo_sort compiles away, and sometimes it does not:

julia> using Test 

julia> @code_typed topo_sort((A(), B(), C(), D(), E(), F(), G()))
CodeInfo(
1 ─     return (A(), C(), B(), D(), E(), F(), G())
) => Tuple{A, C, B, D, E, F, G}

julia> @inferred topo_sort((A(), B(), C(), D(), E(), F(), G()))
(A(), C(), B(), D(), E(), F(), G())

julia> @code_typed topo_sort((C(), F(), B(), A(), D(), E(), G()))
CodeInfo(
1 ─ %1 = invoke Main._topo_sort((C(), D(), E(), F(), G())::Tuple{C, D, E, F, G}, (B(), A(), D(), E(), G())::Tuple{B, A, D, E, G})::Tuple{Any, Vararg{Any}}
└──      return %1
) => Tuple{Any, Vararg{Any}}

julia> @inferred topo_sort((C(), F(), B(), A(), D(), E(), G()))
ERROR: return type Tuple{A, B, C, D, E, F, G} does not match inferred return type Tuple{Any, Vararg{Any}}

Is there any way my implementation can be improved so that it is more likely to constant propagate in a wider variety of situations?

I don’t know if this is acceptable in your situation, but if you use @assume_effects,

Base.@assume_effects :total topo_sort(to_visit::Tuple) = _topo_sort((), to_visit)

then inference works:

julia> @inferred topo_sort((C(), F(), B(), A(), D(), E(), G()))
(A(), B(), C(), D(), E(), F(), G())
1 Like

For recursive tuple code like this, my rule of thumb is that the arguments should be decreasing in complexity on each recursive step. In other words, the red flag here is that (I think) your new_order is getting longer, whereas you want things that get shorter (like children).

I’m not sure how you’d refactor this to achieve that, unfortunately.

6 Likes

The compiler won’t stop inferring recursive types, and the inference process appears to be linear to the depth of the type currently. Which would make complete inference quadratic to the depth of the actual recursive structure: you don’t want this. You would end up with a complete copy of the tree structure inhabiting the type system, very slowly, and to no benefit.

The available solutions depend on the details of the type, but revolve around partially-parameterizing at some point in the process, or insisting on Any.

Here’s an interesting related example. Suppose we already have a tuple order that contains a topological sort of singleton structs, and we have a smaller tuple xs with a subset of the full topological sort (with no duplicates). However, the elements in xs are not topologically sorted. What we would like to do is return a new tuple where the elements of xs are topologically sorted. We can do this with the following recursive code.

Implementation:

function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
    S === T || in_tuple(x, Base.tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false

Base.@assume_effects :total function tuple_topo_sort(order, match, xs)
    if length(match) == length(xs)
        return match
    end

    first = order[1]

    if in_tuple(first, xs)
        tuple_topo_sort(Base.tail(order), (match..., first), xs)
    else
        tuple_topo_sort(Base.tail(order), match, xs)
    end
end

Test case:

using Test
using BenchmarkTools

for i in 1:100
    sym = Symbol("A$i")
    @eval struct $sym end
end

order_vec = map(1:100) do i
    sym = Symbol("A$i")
    eval(:($sym()))
end

order = tuple(order_vec...)

The tuple_topo_sort function is inferred for this example:

julia> @inferred tuple_topo_sort(order, (), (A75(), A50(), A25()))
(A25(), A50(), A75())

However, it does not completely compile away:

julia> @code_typed tuple_topo_sort(order, (), (A75(), A50(), A25()))
CodeInfo(
1 ─     nothing::Nothing
│       nothing::Nothing
│       Core._apply_iterate(Base.iterate, Base.argtail, order)::Tuple{A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, A62, A63, A64, A65, A66, A67, A68, A69, A70, A71, A72, A73, A74, A75, A76, A77, A78, A79, A80, A81, A82, A83, A84, A85, A86, A87, A88, A89, A90, A91, A92, A93, A94, A95, A96, A97, A98, A99, A100}
└──     return (A25(), A50(), A75())
) => Tuple{A25, A50, A75}

We can see a Core._apply_iterate in there, and we can see its affect on the runtime of tuple_topo_sort:

julia> @btime tuple_topo_sort(order, (), (A75(), A50(), A25()));
  3.030 μs (0 allocations: 0 bytes)

Any ideas on how to get this case to completely compile away?

I may look more deeply into this later, but I suggest you try what I did in TupleSorting.jl. In src/TupleUtils.jl there’s a short tup_tail method which is a replacement for Base.tail based on ntuple instead of on argtail.

There’s also a PR: add a `Base.tail` method for nonempty tuples by nsajko · Pull Request #52035 · JuliaLang/julia · GitHub

You might also be interested to see the implementation of ntuple(::Any, ::Val), which is basically this (look in base/ntuple.jl in the Julia source), I slightly simplified by removing the @generated branch:

function ntuple(f::F, ::Val{N}) where {F,N}
    N::Int
    (N >= 0) || throw(ArgumentError(LazyString("tuple length should be ≥ 0, got ", N)))
    Tuple(f(i) for i = 1:N)
end

Side note: it’s a bit annoying seeing all the trade offs between compilation cost, sysimage size, etc. on one side and runtime performance on the other that Julia Base and the Julia ecosystem have to make. There should be a way to somehow allow users to opt into maximum runtime performance while still using largely the same code paths.

3 Likes

Awesome, that worked! Thanks, @nsajko. Here’s the complete script:

Code
using Test
using BenchmarkTools

function tail(x::Tuple{Any, Vararg{Any, N}}) where {N}
    f = let x = x
        i -> x[i + 1]
    end
    ntuple(f, Val(N))::NTuple{N, Any}
end

function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
    S === T || in_tuple(x, tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false

Base.@assume_effects :total function tuple_topo_sort(order, match, xs)
    if length(match) == length(xs)
        return match
    end

    first = order[1]

    if in_tuple(first, xs)
        tuple_topo_sort(tail(order), (match..., first), xs)
    else
        tuple_topo_sort(tail(order), match, xs)
    end
end

for i in 1:100
    sym = Symbol("A$i")
    @eval struct $sym end
end

order_vec = map(1:100) do i
    sym = Symbol("A$i")
    eval(:($sym()))
end

order = tuple(order_vec...)
xs = (A75(), A50(), A25())

Testing it in the REPL:

julia> @inferred tuple_topo_sort(order, (), xs)
(A25(), A50(), A75())

julia> @code_typed tuple_topo_sort(order, (), xs)
CodeInfo(
1 ─     nothing::Nothing
│       nothing::Nothing
└──     return (A25(), A50(), A75())
) => Tuple{A25, A50, A75}

julia> @btime tuple_topo_sort($order, (), $xs);
  1.432 ns (0 allocations: 0 bytes)

julia> bar(::Tuple{A25, A50, A75}) = 42;

julia> foo(order, xs) = bar(tuple_topo_sort(order, (), xs));

julia> @code_typed foo(order, xs)
CodeInfo(
1 ─     return 42
) => Int64

julia> @btime foo($order, $xs);
  1.432 ns (0 allocations: 0 bytes)

(Regarding the original topological sort, I realized that for my problem I can just keep an order tuple around and when new structs are added to the DAG I can insert them into the correct location in order, and that operation is not performance sensitive.)

2 Likes

Any recursion will blow the stack in the end. Is that how you know a segfauly or something? You can change it with ulimit -s (not sure how on Windows; from within a process with setrlimit and maybe it’s cross-platform).

I didn’t know Python had this limitation/and this solution:

Increasing the recursion limit: Python has a default maximum recursion depth of 1000. If a function exceeds this limit, it can be increased using the sys.setrecursionlimit(n) function.

I doubt Julia has any specific limitations except what the hardware, or rather OS gives you. At least not in general, but maybe for compilation specifically (but why would it?).

Interesting. Julia scales badly on this benchmark (because of constant propagation if I recall) from my friend Per: GitHub - nordlow/compiler-benchmark: Benchmarks compilation speeds of different combinations of languages and compilers.

The code and benchmark there is highly unusual, just to exercise the compiler, not for real-world code, so I don’t (and others) do not care enough. I believe the issue would go away on -O0 if it really meant no optimizations, as for other languages. I understand constant propagation is the only (or one of few) done on -O0.

The Julia compiler does eventually give up on inference for recursive and mutually recursive functions because otherwise the compile times could blow up. However, as @matthias314 pointed out, careful use of Base.@assume_effects can convince the compiler to keep inferring. I haven’t played around with the different settings yet, but my hypothesis is that the :terminates_globally setting by itself might be enough to get the compiler to completely infer a recursive function.

2 Likes