FLoops for updating a vector by index

UPDATE Oh, I just noticed that the elements, i.e. the indices, can appear multiple times which could lead to race conditions. Sorry, it’s very early in the morning here :smiley:

Since you don’t want to perform a reduction, I wouldn’t use @reduce at all. Did you try the following straightforward variant?

function build_vector_floop(du, elements)
   @floop for k ∈ elements
       du[k] += sin(k)
   end
   return nothing
end

For me, this gives

julia> @benchmark build_vector_floop(du, k) setup=(du=zeros(10_000_000); k=_k) evals=1
BenchmarkTools.Trial: 164 samples with 1 evaluation.
 Range (min … max):  13.154 ms …  15.656 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     13.374 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   13.497 ms Β± 352.965 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

     β–‚β–…β–…β–ˆ ▁
  β–ƒβ–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–…β–β–†β–…β–…β–β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–β–β–β–ƒβ–ƒβ–ƒβ–ƒβ–β–ƒβ–ƒβ–ƒβ–ƒβ–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒ β–ƒ
  13.2 ms         Histogram: frequency by time           15 ms <

 Memory estimate: 3.67 KiB, allocs estimate: 51.

julia> @benchmark build_vector_explicit(du, k) setup=(du=zeros(10_000_000); k=_k) evals=1
BenchmarkTools.Trial: 75 samples with 1 evaluation.
 Range (min … max):  50.402 ms …  54.720 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     51.398 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   51.471 ms Β± 766.897 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

       β–„  β–ˆ            β–‚
  β–„β–„β–„β–β–†β–ˆβ–„β–ˆβ–ˆβ–„β–ˆβ–„β–„β–„β–†β–„β–„β–†β–„β–†β–†β–ˆβ–ˆβ–ˆβ–„β–ˆβ–β–†β–β–„β–β–„β–†β–ˆβ–β–„β–β–„β–†β–†β–„β–β–β–„β–β–β–†β–β–β–β–β–β–β–„β–β–β–β–β–β–„ ▁
  50.4 ms         Histogram: frequency by time         53.4 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

So, about 3.8x faster on my machine (with 6 threads).

2 Likes