I ran into unexpected allocations in a complex iterator and was able to simplify it down into this MWE which uses complex logic to always yield nothing.
struct MyIter end
Base.iterate(::MyIter) = (nothing, [true])
function Base.iterate(::MyIter, i)
for j in reverse(eachindex(i)) # 1:-1:1
if i[j] # true
for _ in 0:1 end # Empty loop
return nothing, i
end
end
end
f(n) = for x in zip(MyIter(), 1:n) end
@time f(1_000_000)
# 0.024644 seconds (1.00 M allocations: 15.259 MiB)
I can eliminate the allocations in this case by deleting the empty loop. My question is why are there allocations in the first place?
That means that the compiler doesn’t know (until runtime) what type the return value will be, and has to allocate a “box” to hold the return value + type tag on every call.
@code_warntype iterate(MyIter(), [true]) would also have told you this.
Add a return nothing, i line to the end of the iterate function an the allocations go away.
My bad, iterate functions are essentially never type stable — they either return nothing or a tuple — but since there are only two possible return types the compiler handles it efficiently (or is supposed to).
Looks like it’s just the inlining heuristic (the empty loop is making the function “too complex” to automatically inline) — if you add @inline to your iterate definition then the allocations go away.
(I think that in order for the compiler to eliminate the allocations from the inherently non-typestable iterate function it needs to be inlined?)
Is this really the case? I was under the impression that small inferred type Unions were optimized to type-stable branches a.k.a. Union-splitting (second to last paragraph mentions Union{Nothing, T} in particular). It seems like it would work even without inlining.
The problem is maybe how the return value of the function is represented if it’s not inlined. If it’s not inlined, the function is compiled without knowing the context in which it is called — what does the return value look like for a Union if not a pointer to a box?
giordano’s earlier comment just replaced [true] with (true,) in the original code, and that removes the allocations even though the Union{Nothing, T} instability is still there. iterate generally infers as Union{Nothing, T} so I expect this generally happens, just something unusual is happening this time.
I expect that there is something like that prior to the type-checking, type-stable branches. But I wouldn’t expect it to be reallocated each iteration. This also isn’t entirely because [true] is mutable, using a Dict(1 => true) and some associated edits to iterate only uses 4 allocations for the 1 dictionary (Julia v1.7.3 to be clear).
...
Base.iterate(::MyIter) = (nothing, Dict(1 => true))
function Base.iterate(::MyIter, i)
for j in 1:-1:1 # Dict doesn't have eachindex
...
julia> @time f(1_000_000)
0.003991 seconds (4 allocations: 432 bytes)
julia> @time Dict(1 => true)
0.000004 seconds (4 allocations: 432 bytes)
EDIT: okay turns out the replacement of reverse(eachindex(i)) in the original code with 1:-1:1, eachindex(i), or 1:length(i) also reduces allocations to 1. Conversely, replacing 1:-1:1 in the Dict version above with 1:length(i), which itself should not allocate, makes f(1_000_000) do 1M allocations again. “Too complex to inline” really sounds like an explanation for how several single changes to iterate would cause an extra allocation, maybe Union-splitting type inference is a separate thing from however boxes are implemented. Also forget what I said about the (true,) version, the @code_llvm f(1_000_000) was just optimized to just returning nothing.