LoopVectorization: @turbo performs worse than @inbounds on trivial loop

Seif_Shebl · August 25, 2021, 2:44am

Yes, on an old-ish i7-4790k Haswell, they have more comparable performance :

julia> @benchmark foreachn!(dotsimd, $zs, $x, $y, $Ns)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  19.600 μs … 162.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.700 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.822 μs ±   2.164 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅  █   ▆  ▂                    ▁                             ▁
  █▁▁█▁▁▁█▁▁█▁▁▁▇▁▁▄▁▁▁▁▁▁▄▁▁▁▇▁▁█▁▁▁█▁▁▇▁▁▁▅▁▁▅▁▁▁▅▁▁▃▁▁▁▇▁▁▆ █
  19.6 μs       Histogram: log(frequency) by time      21.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark foreachn!(dotturbo, $zs, $x, $y, $Ns)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  17.300 μs …  32.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.400 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.434 μs ± 370.649 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆    █     ▆                                         ▁       ▁
  █▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▇▁▁▁▁▁█▁▁▁▁▁▇ █
  17.3 μs       Histogram: log(frequency) by time      18.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Topic		Replies	Views
@turbo speeds routine, slows down everything else Performance loopvectorization	16	2578	June 5, 2021
@inbounds code slower than one without General Usage	17	2268	March 9, 2019
A simple SIMD.jl loop that is slower than a vanilla `@inbounds @simd` Performance simd	8	1834	June 27, 2021
Inconsistent results using LoopVectorization @turbo with linear indexing Performance	1	235	October 2, 2023
@turbo macro gives incorrect results Performance loopvectorization	4	438	October 26, 2022

LoopVectorization: @turbo performs worse than @inbounds on trivial loop

Related topics