Memory allocation in generator with zip


I am experiencing some suspect memory allocation using a generator containing a zip iterator. The minimal example is this:

as = Vector{Float64}[randn(5) for i = 1:200000]
bs = Vector{Float64}[randn(5) for i = 1:200000]

gen1 = (dot(a, b) for (a, b) in zip(as, bs))
gen2 = (dot(a, a) for a in as)

println(@benchmark mean(gen1))
println(@benchmark mean(gen2))

Using BenchmarkTools I get

  memory estimate:  6.10 mb
  allocs estimate:  200001
  minimum time:     5.282 ms (0.00% GC)
  median time:      5.801 ms (0.00% GC)
  mean time:        5.914 ms (2.72% GC)
  maximum time:     9.829 ms (0.00% GC)
  samples:          846
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

for the generator with the zip and and

  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     3.055 ms (0.00% GC)
  median time:      3.233 ms (0.00% GC)
  mean time:        3.294 ms (0.00% GC)
  maximum time:     5.912 ms (0.00% GC)
  samples:          1518
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

for the other one. This might be known, but I would like to know why this happens and if there is a way round.



I know I’m late to the party :slight_smile:
I think the problem here is not actually the zip. You’re computing two different things as you only use a in one of them and a and b in the other. If you write:

gen1 = (dot(a, b) for (a, b) in zip(as, as))

You get the same performance and memory as for gen2

No, in fact this is fixed by now:

julia> @btime mean(gen1)
  4.321 ms (1 allocation: 16 bytes)

julia> @btime mean(gen2)
  3.628 ms (1 allocation: 16 bytes)