julia> v1 = rand(20);
julia> v2 = rand(20);
julia> @benchmark for i in v1, j in v2
i * j
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 21.300 μs … 412.600 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 25.400 μs ┊ GC (median): 0.00%
Time (mean ± σ): 30.219 μs ± 15.585 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆██▇▇▆▄▃▂▂▁▁▁▂▂▂▂▁▁▁ ▁▁▁▁▁ ▂
█████████████████████▇███████▇▇▇▆▆▆▇▆▆▆▆▇▆▆▇▆▆▆▆▇▆▅▅▆▆▅▅▆▆▅▅ █
21.3 μs Histogram: log(frequency) by time 95.6 μs <
Memory estimate: 25.94 KiB, allocs estimate: 1240.
julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
i * j
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 49.600 μs … 87.824 ms ┊ GC (min … max): 0.00% … 99.83%
Time (median): 67.600 μs ┊ GC (median): 0.00%
Time (mean ± σ): 93.756 μs ± 1.223 ms ┊ GC (mean ± σ): 18.44% ± 1.41%
▃▅▄▄▄██▆▅▅▄▄▅▄▃▂▁▁▁▂▂▂▂▁ ▁▁▁▁▁▁ ▂
██████████████████████████████████████▇█▇▇▆▆▆▆▆▆▆▅▆▆▄▅▅▄▆▅▅ █
49.6 μs Histogram: log(frequency) by time 185 μs <
Memory estimate: 86.97 KiB, allocs estimate: 2983.
julia> v1 = v1 |> Tuple;
julia> v2 = v2 |> Tuple;
julia> @benchmark for i in v1, j in v2
i * j
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 151.700 μs … 646.300 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 173.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 189.262 μs ± 46.275 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▆█▇▄▅▆▅▃▃▄▄▃▃▃▃▃▃▂▂▁▁▁▂▂▂▂▁▁ ▁ ▂
███████████████████████████████████▇▇▇▇▇▆▆▆▆▆▆▆▆▇▆▆▆▅▄▆▆▅▅▅▄▄ █
152 μs Histogram: log(frequency) by time 385 μs <
Memory estimate: 32.50 KiB, allocs estimate: 1660.
julia> @benchmark for (i,i1) in zip(v1, 1:20), (j, j1) in zip(v2, 1:20)
i * j
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 53.300 μs … 123.600 ms ┊ GC (min … max): 0.00% … 99.88%
Time (median): 87.800 μs ┊ GC (median): 0.00%
Time (mean ± σ): 146.455 μs ± 2.253 ms ┊ GC (mean ± σ): 30.24% ± 2.00%
▅▅▂ ▂▃█▇▆▅▅▃▃▄▄▂▂▃▂▂▁▁▁▁ ▁ ▂
█████████████████████████████▇▇▇▆▇▇▆▆▄▅▆▅▅▆▃▄▅▅▄▄▃▂▃▂▃▂▃▂▂▂▂▃ █
53.3 μs Histogram: log(frequency) by time 334 μs <
Memory estimate: 162.44 KiB, allocs estimate: 2983.
I would like to know why when v1
and v2
are Vector
s the performance is better. Is it because the type Tuple
is harder for the compiler to optimize when the size of it gets larger (even if the entries are the same type)?
Also, I don’t understand why when including the zip
for the index range in the Tuple
case, the performance actually gets better?
Thank you!