I have the following code which loops over some indices and adds values to two separate parts of an array.
using FLoops, Random
function update_u!(du, u, k, q, j)
v = [k, (k + 1) % length(u) + 1, (k + 7) % length(u) + 1]
q .= [u[v[1]], u[v[3]]]
du[v[j]] += q' * q
du[v[(j%3)+1]] -= sin.(q)' * sin.(q)
nothing
end
function do_serial(du::AbstractVector{T}, u, elements) where {T}
q = zeros(2, length(elements), 3)
fill!(du, zero(T))
for k โ elements
for j in 1:3
@views update_u!(du, u, k, q[:, k, j], j)
end
end
nothing
end
function do_parallel(du::AbstractVector{T}, u, elements) where {T}
q = zeros(2, length(elements), 3)
@floop for k โ elements
for j in 1:3
@views update_u!(du, u, k, q[:, k, j], j)
end
end
nothing
end
n = 10
u = rand(n)
du_serial = zero(u)
du_parallel = zero(u)
k = randperm(n)
do_serial(du_serial, u, k)
do_parallel(du_parallel, u, k)
[du_serial du_parallel abs.(du_serial .- du_parallel)]
10ร3 Matrix{Float64}:
0.316201 0.316201 0.0
0.126995 0.126995 2.77556e-17
0.472551 0.472551 0.0
โฎ
0.0436279 0.230509 0.186881
0.574818 0.574818 0.0
0.0795132 0.370962 0.291448
I should want these final results to be identical (i.e. have the same results when computing serially and with @floops
. How can I modify the code so that this works with FLoops.jl
? The code above does sometimes work, but for the larger examples that I do need (n \in (10000, 50000)), the errors accumulate too significantly to handle.
I canโt seem to identify where the issue is actually happening, since I already extended the array q
to have a column for each k
and for each j
to avoid data races there. Is it possible to build reductions on individual vector elements?