# Compiler recursion limit

I’ve written some recursive code that compiles away due to (I believe) constant propagation. However, I’m afraid that if I run the code on a larger problem I will hit a compiler recursion limit and it will no longer compile away. Does anyone have a rule of thumb for how deep your code can recurse before compiler inference gives up and infers `Any`?

I know, I know, you probably want to see a MWE. I do have one, but it’s in a repo that is currently private, so I’d prefer not to share it at this time. If there is no precise number, like a `JULIA_INFERENCE_RECURSION_LIMIT`, a rule of thumb will do.

3 Likes

Maybe a very simple test case can help answer?

``````julia> using Test

julia> f(::Val{0}) = 0
f (generic function with 1 method)

julia> f(::Val{N}) where {N} = f(Val{N-1}())
f (generic function with 2 methods)

julia> @inferred f(Val(10))
0

julia> @inferred f(Val(100))
0

julia> @inferred f(Val(1000))
Internal error: stack overflow in type inference of f(Base.Val{1000}).
This might be caused by recursion over very long tuples or argument lists.
0
``````

However I assume the compiler behavior is more complex, and typically depends on the actual function involved.

4 Likes

Ok, here’s my example. I wrote a function `topo_sort` that does a topological sort on a tuple of singleton types. There is a DAG relationship among the types, which is defined by overloads of a `children` function.

Here’s the implementation:

``````function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
S === T || in_tuple(x, Base.tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false

function visit_node(order::Tuple, x)
in_tuple(x, order) && return order
new_order = visit_children(order, children(x))
(x, new_order...)
end

function visit_children(order::Tuple, children::Tuple)
new_order = visit_node(order, children[1])
visit_children(new_order, Base.tail(children))
end

visit_children(order::Tuple, ::Tuple{}) = order

topo_sort(to_visit::Tuple) = _topo_sort((), to_visit)

function _topo_sort(order::Tuple, to_visit::Tuple)
new_order = visit_node(order, to_visit[1])
_topo_sort(new_order, Base.tail(to_visit))
end

_topo_sort(order::Tuple, unvisited::Tuple{}) = order
``````

I’ve come up with a fairly simple example DAG that sometimes compiles away and sometimes does not, depending on the order that I provide the singleton types to the `topo_sort` function. Here’s the DAG:

``````struct A end
struct B end
struct C end
struct D end
struct E end
struct F end
struct G end

children(::A) = (B(), C())
children(::B) = (D(), )
children(::C) = (D(), )
children(::D) = (E(), )
children(::E) = (F(), )
children(::F) = (G(), )
children(::G) = ()
``````

And here we can see that sometimes `topo_sort` compiles away, and sometimes it does not:

``````julia> using Test

julia> @code_typed topo_sort((A(), B(), C(), D(), E(), F(), G()))
CodeInfo(
1 ─     return (A(), C(), B(), D(), E(), F(), G())
) => Tuple{A, C, B, D, E, F, G}

julia> @inferred topo_sort((A(), B(), C(), D(), E(), F(), G()))
(A(), C(), B(), D(), E(), F(), G())

julia> @code_typed topo_sort((C(), F(), B(), A(), D(), E(), G()))
CodeInfo(
1 ─ %1 = invoke Main._topo_sort((C(), D(), E(), F(), G())::Tuple{C, D, E, F, G}, (B(), A(), D(), E(), G())::Tuple{B, A, D, E, G})::Tuple{Any, Vararg{Any}}
└──      return %1
) => Tuple{Any, Vararg{Any}}

julia> @inferred topo_sort((C(), F(), B(), A(), D(), E(), G()))
ERROR: return type Tuple{A, B, C, D, E, F, G} does not match inferred return type Tuple{Any, Vararg{Any}}
``````

Is there any way my implementation can be improved so that it is more likely to constant propagate in a wider variety of situations?

I don’t know if this is acceptable in your situation, but if you use `@assume_effects`,

``````Base.@assume_effects :total topo_sort(to_visit::Tuple) = _topo_sort((), to_visit)
``````

then inference works:

``````julia> @inferred topo_sort((C(), F(), B(), A(), D(), E(), G()))
(A(), B(), C(), D(), E(), F(), G())
``````
1 Like

For recursive tuple code like this, my rule of thumb is that the arguments should be decreasing in complexity on each recursive step. In other words, the red flag here is that (I think) your `new_order` is getting longer, whereas you want things that get shorter (like `children`).

I’m not sure how you’d refactor this to achieve that, unfortunately.

6 Likes

The compiler won’t stop inferring recursive types, and the inference process appears to be linear to the depth of the type currently. Which would make complete inference quadratic to the depth of the actual recursive structure: you don’t want this. You would end up with a complete copy of the tree structure inhabiting the type system, very slowly, and to no benefit.

The available solutions depend on the details of the type, but revolve around partially-parameterizing at some point in the process, or insisting on `Any`.

Here’s an interesting related example. Suppose we already have a tuple `order` that contains a topological sort of singleton structs, and we have a smaller tuple `xs` with a subset of the full topological sort (with no duplicates). However, the elements in `xs` are not topologically sorted. What we would like to do is return a new tuple where the elements of `xs` are topologically sorted. We can do this with the following recursive code.

Implementation:

``````function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
S === T || in_tuple(x, Base.tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false

Base.@assume_effects :total function tuple_topo_sort(order, match, xs)
if length(match) == length(xs)
return match
end

first = order[1]

if in_tuple(first, xs)
tuple_topo_sort(Base.tail(order), (match..., first), xs)
else
tuple_topo_sort(Base.tail(order), match, xs)
end
end
``````

Test case:

``````using Test
using BenchmarkTools

for i in 1:100
sym = Symbol("A\$i")
@eval struct \$sym end
end

order_vec = map(1:100) do i
sym = Symbol("A\$i")
eval(:(\$sym()))
end

order = tuple(order_vec...)
``````

The `tuple_topo_sort` function is inferred for this example:

``````julia> @inferred tuple_topo_sort(order, (), (A75(), A50(), A25()))
(A25(), A50(), A75())
``````

However, it does not completely compile away:

``````julia> @code_typed tuple_topo_sort(order, (), (A75(), A50(), A25()))
CodeInfo(
1 ─     nothing::Nothing
│       nothing::Nothing
│       Core._apply_iterate(Base.iterate, Base.argtail, order)::Tuple{A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, A62, A63, A64, A65, A66, A67, A68, A69, A70, A71, A72, A73, A74, A75, A76, A77, A78, A79, A80, A81, A82, A83, A84, A85, A86, A87, A88, A89, A90, A91, A92, A93, A94, A95, A96, A97, A98, A99, A100}
└──     return (A25(), A50(), A75())
) => Tuple{A25, A50, A75}
``````

We can see a `Core._apply_iterate` in there, and we can see its affect on the runtime of `tuple_topo_sort`:

``````julia> @btime tuple_topo_sort(order, (), (A75(), A50(), A25()));
3.030 μs (0 allocations: 0 bytes)
``````

Any ideas on how to get this case to completely compile away?

I may look more deeply into this later, but I suggest you try what I did in TupleSorting.jl. In `src/TupleUtils.jl` there’s a short `tup_tail` method which is a replacement for `Base.tail` based on `ntuple` instead of on `argtail`.

You might also be interested to see the implementation of `ntuple(::Any, ::Val)`, which is basically this (look in `base/ntuple.jl` in the Julia source), I slightly simplified by removing the `@generated` branch:

``````function ntuple(f::F, ::Val{N}) where {F,N}
N::Int
(N >= 0) || throw(ArgumentError(LazyString("tuple length should be ≥ 0, got ", N)))
Tuple(f(i) for i = 1:N)
end
``````

Side note: it’s a bit annoying seeing all the trade offs between compilation cost, sysimage size, etc. on one side and runtime performance on the other that Julia Base and the Julia ecosystem have to make. There should be a way to somehow allow users to opt into maximum runtime performance while still using largely the same code paths.

3 Likes

Awesome, that worked! Thanks, @nsajko. Here’s the complete script:

Code
``````using Test
using BenchmarkTools

function tail(x::Tuple{Any, Vararg{Any, N}}) where {N}
f = let x = x
i -> x[i + 1]
end
ntuple(f, Val(N))::NTuple{N, Any}
end

function in_tuple(x::S, t::Tuple{T, Vararg}) where {S, T}
S === T || in_tuple(x, tail(t))
end

in_tuple(::T, t::Tuple{T, Vararg}) where {T} = true
in_tuple(_, t::Tuple{}) = false

Base.@assume_effects :total function tuple_topo_sort(order, match, xs)
if length(match) == length(xs)
return match
end

first = order[1]

if in_tuple(first, xs)
tuple_topo_sort(tail(order), (match..., first), xs)
else
tuple_topo_sort(tail(order), match, xs)
end
end

for i in 1:100
sym = Symbol("A\$i")
@eval struct \$sym end
end

order_vec = map(1:100) do i
sym = Symbol("A\$i")
eval(:(\$sym()))
end

order = tuple(order_vec...)
xs = (A75(), A50(), A25())
``````

Testing it in the REPL:

``````julia> @inferred tuple_topo_sort(order, (), xs)
(A25(), A50(), A75())

julia> @code_typed tuple_topo_sort(order, (), xs)
CodeInfo(
1 ─     nothing::Nothing
│       nothing::Nothing
└──     return (A25(), A50(), A75())
) => Tuple{A25, A50, A75}

julia> @btime tuple_topo_sort(\$order, (), \$xs);
1.432 ns (0 allocations: 0 bytes)

julia> bar(::Tuple{A25, A50, A75}) = 42;

julia> foo(order, xs) = bar(tuple_topo_sort(order, (), xs));

julia> @code_typed foo(order, xs)
CodeInfo(
1 ─     return 42
) => Int64

julia> @btime foo(\$order, \$xs);
1.432 ns (0 allocations: 0 bytes)
``````

(Regarding the original topological sort, I realized that for my problem I can just keep an `order` tuple around and when new structs are added to the DAG I can insert them into the correct location in `order`, and that operation is not performance sensitive.)

2 Likes

Any recursion will blow the stack in the end. Is that how you know a segfauly or something? You can change it with ulimit -s (not sure how on Windows; from within a process with setrlimit and maybe it’s cross-platform).

I didn’t know Python had this limitation/and this solution:

Increasing the recursion limit: Python has a default maximum recursion depth of 1000. If a function exceeds this limit, it can be increased using the `sys.setrecursionlimit(n)` function.

I doubt Julia has any specific limitations except what the hardware, or rather OS gives you. At least not in general, but maybe for compilation specifically (but why would it?).

Interesting. Julia scales badly on this benchmark (because of constant propagation if I recall) from my friend Per: GitHub - nordlow/compiler-benchmark: Benchmarks compilation speeds of different combinations of languages and compilers.

The code and benchmark there is highly unusual, just to exercise the compiler, not for real-world code, so I don’t (and others) do not care enough. I believe the issue would go away on -O0 if it really meant no optimizations, as for other languages. I understand constant propagation is the only (or one of few) done on -O0.

The Julia compiler does eventually give up on inference for recursive and mutually recursive functions because otherwise the compile times could blow up. However, as @matthias314 pointed out, careful use of `Base.@assume_effects` can convince the compiler to keep inferring. I haven’t played around with the different settings yet, but my hypothesis is that the `:terminates_globally` setting by itself might be enough to get the compiler to completely infer a recursive function.

2 Likes