Performance differences between `Array` and `Tuple` when combined with `zip` of index range in `for` loop

frankwswang · February 7, 2022, 4:41am

julia> v1 = rand(20);

julia> v2 = rand(20);

julia> @benchmark for i in v1, j in v2
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  21.300 μs … 412.600 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.219 μs ±  15.585 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆██▇▇▆▄▃▂▂▁▁▁▂▂▂▂▁▁▁   ▁▁▁▁▁                                 ▂
  █████████████████████▇███████▇▇▇▆▆▆▇▆▆▆▆▇▆▆▇▆▆▆▆▇▆▅▅▆▆▅▅▆▆▅▅ █
  21.3 μs       Histogram: log(frequency) by time      95.6 μs <

 Memory estimate: 25.94 KiB, allocs estimate: 1240.

julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  49.600 μs … 87.824 ms  ┊ GC (min … max):  0.00% … 99.83%
 Time  (median):     67.600 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   93.756 μs ±  1.223 ms  ┊ GC (mean ± σ):  18.44% ±  1.41%

   ▃▅▄▄▄██▆▅▅▄▄▅▄▃▂▁▁▁▂▂▂▂▁ ▁▁▁▁▁▁                            ▂
  ██████████████████████████████████████▇█▇▇▆▆▆▆▆▆▆▅▆▆▄▅▅▄▆▅▅ █
  49.6 μs      Histogram: log(frequency) by time       185 μs <

 Memory estimate: 86.97 KiB, allocs estimate: 2983.

julia> v1 = v1 |> Tuple;

julia> v2 = v2 |> Tuple;

julia> @benchmark for i in v1, j in v2
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  151.700 μs … 646.300 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     173.800 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   189.262 μs ±  46.275 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▆█▇▄▅▆▅▃▃▄▄▃▃▃▃▃▃▂▂▁▁▁▂▂▂▂▁▁ ▁                               ▂
  ███████████████████████████████████▇▇▇▇▇▆▆▆▆▆▆▆▆▇▆▆▆▅▄▆▆▅▅▅▄▄ █
  152 μs        Histogram: log(frequency) by time        385 μs <

 Memory estimate: 32.50 KiB, allocs estimate: 1660.

julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
       i * j
       end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   53.300 μs … 123.600 ms  ┊ GC (min … max):  0.00% … 99.88%
 Time  (median):      87.800 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   146.455 μs ±   2.253 ms  ┊ GC (mean ± σ):  30.24% ±  2.00%

  ▅▅▂ ▂▃█▇▆▅▅▃▃▄▄▂▂▃▂▂▁▁▁▁   ▁                                  ▂
  █████████████████████████████▇▇▇▆▇▇▆▆▄▅▆▅▅▆▃▄▅▅▄▄▃▂▃▂▃▂▃▂▂▂▂▃ █
  53.3 μs       Histogram: log(frequency) by time        334 μs <

 Memory estimate: 162.44 KiB, allocs estimate: 2983.

I would like to know why when v1 and v2 are Vectors the performance is better. Is it because the type Tuple is harder for the compiler to optimize when the size of it gets larger (even if the entries are the same type)?

Also, I don’t understand why when including the zip for the index range in the Tuple case, the performance actually gets better?

Thank you!

jipolanco · February 7, 2022, 7:44am

You’re benchmarking in the global scope. Wrapping your code into functions leads to much faster performance (and Vectors and Tuples are quite similar):

using BenchmarkTools

function loop(v1, v2)
    x = zero(eltype(v1))
    for i in v1, j in v2
        x += i * j
    end
    x
end

function loop_zip(v1, v2)
    x = zero(eltype(v1))
    N = length(v1)
    for (i, i1) in zip(v1, 1:N), (j, j1) in zip(v2, 1:N)
        x += i * j
    end
    x
end

v1 = rand(20)
v2 = rand(20)

t1 = Tuple(v1)
t2 = Tuple(v2)

@benchmark loop($v1, $v2)
@benchmark loop_zip($v1, $v2)

@benchmark loop($t1, $t2)
@benchmark loop_zip($t1, $t2)

Results:

julia> @benchmark loop($v1, $v2)
BenchmarkTools.Trial: 10000 samples with 163 evaluations.
 Range (min … max):  602.018 ns …  1.089 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     627.669 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   646.577 ns ± 47.147 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    █▁  ▃▁▆▂  ▂ ▂      ▂      ▁        ▁                  ▁ ▁
  █▃▄▅▅██████████████▇▆▇▇▆█▆▇▇▇▆▅█▅▅▅▅▅▅▄▃█▄▃▁▄▅▄▄▃█▃▅▁▃▄▄▃▃▄█ █
  602 ns        Histogram: log(frequency) by time       886 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop_zip($v1, $v2)
BenchmarkTools.Trial: 10000 samples with 164 evaluations.
 Range (min … max):  624.652 ns …  1.229 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     626.732 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   649.457 ns ± 54.594 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▂ ▂▃▄▁▁   ▄     ▃      ▂                                    ▁
  █████████████▇▇█▇██▇▇▇▇▇█▆▅▆▆▆▆▇▇▅▅▆▆▅▄▅█▅▄▄▄▄▃▄▄█▅▅▃▄▄▅▄▁▄█ █
  625 ns        Histogram: log(frequency) by time       920 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop($t1, $t2)
BenchmarkTools.Trial: 10000 samples with 174 evaluations.
 Range (min … max):  614.747 ns …  1.228 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     679.207 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   681.793 ns ± 54.629 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅▆    ▆█  ▂▂▁▆█▂▁▂▃▂▄▆▃▁▁▁▂▁▁ ▂ ▁▂▁    ▂        ▁▂           ▂
  ██▁▆████████████████████████████████▇██████▆▆▆▆▆███▄▅▆▆▅▅▅▇█ █
  615 ns        Histogram: log(frequency) by time       885 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark loop_zip($t1, $t2)
BenchmarkTools.Trial: 10000 samples with 159 evaluations.
 Range (min … max):  609.497 ns …  1.382 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     636.862 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   661.686 ns ± 62.211 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▂█▁  ▃▂▇▃▁ ▂ ▄▃▁ ▁  ▄▁     ▁ ▁     ▁         ▁         ▂ ▂
  ██▇▇████▇█████████████▇████▇█▇▆█▆█▆▆▆▅▆█▆▆▅▄▆▆▆▄▃█▃▅▃▂▄▄▄▃▃█ █
  609 ns        Histogram: log(frequency) by time       899 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

frankwswang · February 7, 2022, 4:56pm

Oh, that makes sense. Thank you!

Topic		Replies	Views
Sum of tuples are slow Performance	12	1455	November 17, 2023
Performance of creating a tuple with a for loop Performance	26	2629	September 15, 2020
Possible Issue? Performance making tuples from vectors General Usage bug , performance	0	396	July 25, 2019
Sum over tuple slower than sum over array Performance	3	163	January 24, 2025
Converting Vector to Tuple seems not optimal Performance tuple , memory-allocation	11	436	June 22, 2024

Performance differences between `Array` and `Tuple` when combined with `zip` of index range in `for` loop

Related topics