Why does my iterable allocate?

I ran into unexpected allocations in a complex iterator and was able to simplify it down into this MWE which uses complex logic to always yield nothing.

struct MyIter end

Base.iterate(::MyIter) = (nothing, [true])
function Base.iterate(::MyIter, i)
    for j in reverse(eachindex(i)) # 1:-1:1
        if i[j]                    # true
            for _ in 0:1 end       # Empty loop
            return nothing, i
        end
    end
end

f(n) = for x in zip(MyIter(), 1:n) end

@time f(1_000_000)
#  0.024644 seconds (1.00 M allocations: 15.259 MiB)

I can eliminate the allocations in this case by deleting the empty loop. My question is why are there allocations in the first place?

1 Like

[true]?

3 Likes

[true] allocates a vector. You iterate 1 million times, you get 1 million allocations.

That’s not it because that function is only called once:

Base.iterate(::MyIter) = (nothing, (println("Once"); [true]))
@time f(1_000_000)
#Once
#  0.023507 seconds (1.00 M allocations: 15.259 MiB)
1 Like

The problem is that your iterate function is not type-stable:

julia> iterate(MyIter(), [true])
(nothing, Bool[1])

julia> iterate(MyIter(), [false])
nothing

That means that the compiler doesn’t know (until runtime) what type the return value will be, and has to allocate a “box” to hold the return value + type tag on every call.

@code_warntype iterate(MyIter(), [true]) would also have told you this.

Add a return nothing, i line to the end of the iterate function an the allocations go away.

1 Like

Is it possible to write a type stable iterate function for an iterable with 10 elements?

My bad, iterate functions are essentially never type stable — they either return nothing or a tuple — but since there are only two possible return types the compiler handles it efficiently (or is supposed to).

2 Likes
julia> Base.iterate(::MyIter) = (nothing, (true,))

julia> @time f(1_000_000)
  0.029855 seconds
2 Likes

Yeah, that’s a bit weird.

Looks like it’s just the inlining heuristic (the empty loop is making the function “too complex” to automatically inline) — if you add @inline to your iterate definition then the allocations go away.

(I think that in order for the compiler to eliminate the allocations from the inherently non-typestable iterate function it needs to be inlined?)

8 Likes

Thanks! That explains why my example here allocates and adding an @inline annotation fixes the real example that was too complicated to post here.

Is this really the case? I was under the impression that small inferred type Unions were optimized to type-stable branches a.k.a. Union-splitting (second to last paragraph mentions Union{Nothing, T} in particular). It seems like it would work even without inlining.

The problem is maybe how the return value of the function is represented if it’s not inlined. If it’s not inlined, the function is compiled without knowing the context in which it is called — what does the return value look like for a Union if not a pointer to a box?

giordano’s earlier comment just replaced [true] with (true,) in the original code, and that removes the allocations even though the Union{Nothing, T} instability is still there. iterate generally infers as Union{Nothing, T} so I expect this generally happens, just something unusual is happening this time.

I expect that there is something like that prior to the type-checking, type-stable branches. But I wouldn’t expect it to be reallocated each iteration. This also isn’t entirely because [true] is mutable, using a Dict(1 => true) and some associated edits to iterate only uses 4 allocations for the 1 dictionary (Julia v1.7.3 to be clear).

...
       Base.iterate(::MyIter) = (nothing, Dict(1 => true))
       function Base.iterate(::MyIter, i)
           for j in 1:-1:1 # Dict doesn't have eachindex
...
julia> @time f(1_000_000)
  0.003991 seconds (4 allocations: 432 bytes)

julia> @time Dict(1 => true)
  0.000004 seconds (4 allocations: 432 bytes)

EDIT: okay turns out the replacement of reverse(eachindex(i)) in the original code with 1:-1:1, eachindex(i), or 1:length(i) also reduces allocations to 1. Conversely, replacing 1:-1:1 in the Dict version above with 1:length(i), which itself should not allocate, makes f(1_000_000) do 1M allocations again. “Too complex to inline” really sounds like an explanation for how several single changes to iterate would cause an extra allocation, maybe Union-splitting type inference is a separate thing from however boxes are implemented. Also forget what I said about the (true,) version, the @code_llvm f(1_000_000) was just optimized to just returning nothing.