Memory allocation in iterator

There are already several discussions about memory allocations in iterators but I have a very simple question that bothers me whenever I implement an iterator. Take the example iterator from the manual:

julia> struct Squares
           count::Int
       end

julia> Base.iterate(S::Squares, state=1) = state > S.count ? nothing : (state*state, state+1)

Let’s initialize:

julia> I = Squares(10); a,state = iterate(I);

Now, why does the iterate function allocate memory?

julia> @time iterate(I,state);
  0.000002 seconds (1 allocation: 32 bytes)

This scales:

julia> I = Squares(10000)
Squares(10000)

julia> @time for x in I
       nothing
       end
  0.000515 seconds (29.47 k allocations: 616.688 KiB)

616.688 KiB for just looping through squares seems to be absurd. I feel the memory allocation comes from the tuple creation of the return value. But there must be a way to avoid this. How?

The allocations are there because you’re running things in the global scope, and code from elsewhere can affect the global scope e.g. julia> I = ("blah", "blah"). When the type of a variable cannot be assumed to stay the same, Julia usually needs to allocate something with an unfixed type and size.

On the other hand, local scopes are mostly isolated. When you run this code in a local scope like a function or a let block, the allocations aren’t needed:

let
  I = Squares(10); a,state = iterate(I);
  @time iterate(I,state)
  @time for x in I
    nothing
  end
end

Replace let with begin when you copy and paste to see the global scope code behavior again.

1 Like

Oh, I’ve never heard of that before. So, should I use this local scope when debugging an iterator for performance?

That depends. If you’re going to run things in local scope, usually in a function but also in most blocks, then you should test things in a local scope (temporary test function, pasting in let block) for an accurate view of the performance. If you’re planning to run code in a global scope, then you should test it in the global scope (begin block).

You’ll usually want to wrap code in a function that returns the results you want, but running code in the global scope isn’t necessarily a bad thing, especially if the performance penalty isn’t large and it only ever runs once with the file.

Use BenchmarkTools.jl for benchmarking instead of just raw @time. It will take care of gathering statistics to reduce jitter due to other programs on your computer taking CPU time/memory, handle GC and (most importantly) wraps your code in a function (i.e. creates a local scope) that it executes for you.

EDIT: for comparison:

julia> @benchmark iterate(I, state) setup=(I = Squares(10); a = iterate(I); state=a[1]) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.000 ns …  1.800 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.000 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.946 ns ± 20.915 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃ ▆ █ ▇ ▄ ▁ ▁ ▁   ▁ ▁ ▁ ▁ ▁ ▁                             ▂
  ▇▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁█▁▇▁▆▁▇▁█▁█▁▇▁▆▁▇▁▆▁▇▁▆▁▆▁▇▁▆ █
  23 ns        Histogram: log(frequency) by time        52 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Hm, I think I still don’t fully understand this. For the iterate function maybe I understand your comment. But

julia> I = Squares(1000);

julia> @time for x in I
              nothing
              end
  0.000054 seconds (2.47 k allocations: 54.188 KiB)

would be something I would use in the global scope all the time. I don’t understand where there should be type issues. Also, actually why does

julia> @time for x in Squares(1000)
              nothing
              end
  0.000001 seconds

not use any allocations? Huh?

I checked with BenchmarkTools but there’s no difference:

julia> @btime for x in I
              nothing
              end
  46.214 μs (2468 allocations: 54.19 KiB)

Because there is no global variable involved. The Squares(1000) part is a constant that’s only used during the loop, it will not exist afterwards anymore. It never even enters global scope.

You’re accessing a global variable - interpolate it using $I:

julia> I = Squares(1000)
Squares(1000)

julia> @btime for x in $I
           nothing
       end
  304.864 ns (0 allocations: 0 bytes)

Accessing a non-const global variable may have to allocate because the value (& the type) of the binding (i.e. the name I in global scope) may change at any time.

The setup argument to the @benchmark macro I used above pastes this setup code in front of the timing part, all inside a function of its own. Thus no global variables are involved when using the setup part.

Oh boy, and I thought I knew some Julia. Thanks for the explanations. I’ll probably need a bit to let this sink in.

Well it is mentioned at the very top of the performance tips :person_shrugging:

Like I said, it’s the non-constness that’s the problem here, since inference has to assume Any and insert type lookups, dynamic dispatch… all that shebang. If you const the global, the allocations also go away:

julia> const J = Squares(1000)
Squares(1000)

julia> @btime for x in J
           nothing
       end
  2.752 ns (0 allocations: 0 bytes)

So yeah - 1) don’t use global scope and 2) if you have to, make sure to have your bindings const, to at least fix the type in place (you’ll still have dynamic lookups (unless it’s small enoguh and got inlined, which the compiler is allowed to do for const declared globals) on each access since the value of the global may have changed, which is why it’s best to avoid them entirely).

You can still change the value of the constant, but here be dragons:

julia> J = Squares(10) # can change the value
WARNING: redefinition of constant J. This may fail, cause incorrect answers, or produce other errors.
Squares(10)

julia> J = 10 #  can't change the type
ERROR: invalid redefinition of constant J
Stacktrace:
 [1] top-level scope
   @ REPL[11]:1

I might’ve confused you a bit because what I wrote seemed to imply that putting code in a local scope alone makes all the types inferrable. But notice that in my example, the assignment I = Squares(10) is inside the let block a.k.a. it is a local variable. The compiler can treat the local I’s type as fixed because no I assignments in the global scope or other local scopes can affect it.

for x in I does create a local scope, but it doesn’t declare I inside the local scope; it accesses the global variable I and takes all its type uncertainty.