Performance differences between `Array` and `Tuple` when combined with `zip` of index range in `for` loop

julia> v1 = rand(20);

julia> v2 = rand(20);

julia> @benchmark for i in v1, j in v2
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  21.300 μs … 412.600 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.219 μs ±  15.585 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆██▇▇▆▄▃▂▂▁▁▁▂▂▂▂▁▁▁   ▁▁▁▁▁                                 ▂
  █████████████████████▇███████▇▇▇▆▆▆▇▆▆▆▆▇▆▆▇▆▆▆▆▇▆▅▅▆▆▅▅▆▆▅▅ █
  21.3 μs       Histogram: log(frequency) by time      95.6 μs <

 Memory estimate: 25.94 KiB, allocs estimate: 1240.

julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  49.600 μs … 87.824 ms  ┊ GC (min … max):  0.00% … 99.83%
 Time  (median):     67.600 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   93.756 μs ±  1.223 ms  ┊ GC (mean ± σ):  18.44% ±  1.41%

   ▃▅▄▄▄██▆▅▅▄▄▅▄▃▂▁▁▁▂▂▂▂▁ ▁▁▁▁▁▁                            ▂
  ██████████████████████████████████████▇█▇▇▆▆▆▆▆▆▆▅▆▆▄▅▅▄▆▅▅ █
  49.6 μs      Histogram: log(frequency) by time       185 μs <

 Memory estimate: 86.97 KiB, allocs estimate: 2983.

julia> v1 = v1 |> Tuple;

julia> v2 = v2 |> Tuple;

julia> @benchmark for i in v1, j in v2
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  151.700 μs … 646.300 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     173.800 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   189.262 μs ±  46.275 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▆█▇▄▅▆▅▃▃▄▄▃▃▃▃▃▃▂▂▁▁▁▂▂▂▂▁▁ ▁                               ▂
  ███████████████████████████████████▇▇▇▇▇▆▆▆▆▆▆▆▆▇▆▆▆▅▄▆▆▅▅▅▄▄ █
  152 μs        Histogram: log(frequency) by time        385 μs <

 Memory estimate: 32.50 KiB, allocs estimate: 1660.

julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   53.300 μs … 123.600 ms  ┊ GC (min … max):  0.00% … 99.88%
 Time  (median):      87.800 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   146.455 μs ±   2.253 ms  ┊ GC (mean ± σ):  30.24% ±  2.00%

  ▅▅▂ ▂▃█▇▆▅▅▃▃▄▄▂▂▃▂▂▁▁▁▁   ▁                                  ▂
  █████████████████████████████▇▇▇▆▇▇▆▆▄▅▆▅▅▆▃▄▅▅▄▄▃▂▃▂▃▂▃▂▂▂▂▃ █
  53.3 μs       Histogram: log(frequency) by time        334 μs <

 Memory estimate: 162.44 KiB, allocs estimate: 2983.

I would like to know why when v1 and v2 are Vectors the performance is better. Is it because the type Tuple is harder for the compiler to optimize when the size of it gets larger (even if the entries are the same type)?

Also, I don’t understand why when including the zip for the index range in the Tuple case, the performance actually gets better?

Thank you!

You’re benchmarking in the global scope. Wrapping your code into functions leads to much faster performance (and Vectors and Tuples are quite similar):

using BenchmarkTools

function loop(v1, v2)
    x = zero(eltype(v1))
    for i in v1, j in v2
        x += i * j
    end
    x
end

function loop_zip(v1, v2)
    x = zero(eltype(v1))
    N = length(v1)
    for (i, i1) in zip(v1, 1:N), (j, j1) in zip(v2, 1:N)
        x += i * j
    end
    x
end

v1 = rand(20)
v2 = rand(20)

t1 = Tuple(v1)
t2 = Tuple(v2)

@benchmark loop($v1, $v2)
@benchmark loop_zip($v1, $v2)

@benchmark loop($t1, $t2)
@benchmark loop_zip($t1, $t2)

Results:

julia> @benchmark loop($v1, $v2)
BenchmarkTools.Trial: 10000 samples with 163 evaluations.
 Range (min … max):  602.018 ns …  1.089 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     627.669 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   646.577 ns ± 47.147 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    █▁  ▃▁▆▂  ▂ ▂      ▂      ▁        ▁                  ▁ ▁
  █▃▄▅▅██████████████▇▆▇▇▆█▆▇▇▇▆▅█▅▅▅▅▅▅▄▃█▄▃▁▄▅▄▄▃█▃▅▁▃▄▄▃▃▄█ █
  602 ns        Histogram: log(frequency) by time       886 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop_zip($v1, $v2)
BenchmarkTools.Trial: 10000 samples with 164 evaluations.
 Range (min … max):  624.652 ns …  1.229 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     626.732 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   649.457 ns ± 54.594 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂ ▂▃▄▁▁   ▄     ▃      ▂                                    ▁
  █████████████▇▇█▇██▇▇▇▇▇█▆▅▆▆▆▆▇▇▅▅▆▆▅▄▅█▅▄▄▄▄▃▄▄█▅▅▃▄▄▅▄▁▄█ █
  625 ns        Histogram: log(frequency) by time       920 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop($t1, $t2)
BenchmarkTools.Trial: 10000 samples with 174 evaluations.
 Range (min … max):  614.747 ns …  1.228 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     679.207 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   681.793 ns ± 54.629 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅▆    ▆█  ▂▂▁▆█▂▁▂▃▂▄▆▃▁▁▁▂▁▁ ▂ ▁▂▁    ▂        ▁▂           ▂
  ██▁▆████████████████████████████████▇██████▆▆▆▆▆███▄▅▆▆▅▅▅▇█ █
  615 ns        Histogram: log(frequency) by time       885 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop_zip($t1, $t2)
BenchmarkTools.Trial: 10000 samples with 159 evaluations.
 Range (min … max):  609.497 ns …  1.382 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     636.862 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   661.686 ns ± 62.211 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▂█▁  ▃▂▇▃▁ ▂ ▄▃▁ ▁  ▄▁     ▁ ▁     ▁         ▁         ▂ ▂
  ██▇▇████▇█████████████▇████▇█▇▆█▆█▆▆▆▅▆█▆▆▅▄▆▆▆▄▃█▃▅▃▂▄▄▄▃▃█ █
  609 ns        Histogram: log(frequency) by time       899 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

1 Like

Oh, that makes sense. Thank you!