My first instinct here is to modify your code as follows. The most important part is assigning t_r
outside of the inner loop.
function map_fidelity2(potential_depth_values, separation_time_values; kwargs...)
N = length(potential_depth_values)
M = length(separation_time_values)
F = zeros(N, M)
Threads.@threads for j in eachindex(separation_time_values)
@inbounds t_r = separation_time_values[j]
@inbounds for i in eachindex(potential_depth_values)
V0 = potential_depth_values[i]
F[i, j] = propagate_splitting(t_r, V0; kwargs...)
end
end
return F
end
There could be some false sharing issues here.
To address false sharing, we might try allocating an independent vector per task, and then assemble the matrix at the end.
function map_fidelity3(potential_depth_values, separation_time_values; kwargs...)
N = length(potential_depth_values)
M = length(separation_time_values)
F = Vector{Vector{Float64}}(undef, M)
Threads.@threads for j in eachindex(separation_time_values)
@inbounds t_r = separation_time_values[j]
F_j = zeros(N)
@inbounds for i in eachindex(potential_depth_values)
V0 = potential_depth_values[i]
F_j[i] = propagate_splitting(t_r, V0; kwargs...)
end
F[j] = F_j
end
F = hcat(F_j...)
return F
end
I’m not an expert at addressing false sharing issues, so perhaps there is a better way.
Let’s try to setup a minimum working example (MWE) so others can actually try to execute something. As others have said, we probably need to know more about propagate_splitting
. For now, I’ll just substitute sleep
in:
julia> begin
const V0 = 1.5
const N, M = 22, 13
propagate_splitting(t_r, V0) = (sleep(1); t_r + V0)
potential_depth_values = rand(N)
separation_time_values = rand(M)
end;