UPDATE Oh, I just noticed that the elements, i.e. the indices, can appear multiple times which could lead to race conditions. Sorry, itβs very early in the morning here ![]()
Since you donβt want to perform a reduction, I wouldnβt use @reduce at all. Did you try the following straightforward variant?
function build_vector_floop(du, elements)
@floop for k β elements
du[k] += sin(k)
end
return nothing
end
For me, this gives
julia> @benchmark build_vector_floop(du, k) setup=(du=zeros(10_000_000); k=_k) evals=1
BenchmarkTools.Trial: 164 samples with 1 evaluation.
Range (min β¦ max): 13.154 ms β¦ 15.656 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 13.374 ms β GC (median): 0.00%
Time (mean Β± Ο): 13.497 ms Β± 352.965 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ
β
β β
ββββββββββββ
βββ
β
ββββββββββββββββββββββββββββββββββββββββββββ β
13.2 ms Histogram: frequency by time 15 ms <
Memory estimate: 3.67 KiB, allocs estimate: 51.
julia> @benchmark build_vector_explicit(du, k) setup=(du=zeros(10_000_000); k=_k) evals=1
BenchmarkTools.Trial: 75 samples with 1 evaluation.
Range (min β¦ max): 50.402 ms β¦ 54.720 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 51.398 ms β GC (median): 0.00%
Time (mean Β± Ο): 51.471 ms Β± 766.897 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
50.4 ms Histogram: frequency by time 53.4 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
So, about 3.8x faster on my machine (with 6 threads).