Scaling of @threads for "embarrassingly parallel" problem

My first instinct here is to modify your code as follows. The most important part is assigning t_r outside of the inner loop.

function map_fidelity2(potential_depth_values, separation_time_values; kwargs...)
    N = length(potential_depth_values)
    M = length(separation_time_values)
    F = zeros(N, M)
    Threads.@threads for j in eachindex(separation_time_values)
        @inbounds t_r = separation_time_values[j]
        @inbounds for i in eachindex(potential_depth_values)
            V0 = potential_depth_values[i]
            F[i, j] = propagate_splitting(t_r, V0; kwargs...)
        end
    end
    return F
end

There could be some false sharing issues here.

To address false sharing, we might try allocating an independent vector per task, and then assemble the matrix at the end.

function map_fidelity3(potential_depth_values, separation_time_values; kwargs...)
    N = length(potential_depth_values)
    M = length(separation_time_values)
    F = Vector{Vector{Float64}}(undef, M)
    Threads.@threads for j in eachindex(separation_time_values)
        @inbounds t_r = separation_time_values[j]
        F_j = zeros(N)
        @inbounds for i in eachindex(potential_depth_values)
            V0 = potential_depth_values[i]
            F_j[i] = propagate_splitting(t_r, V0; kwargs...)
        end
        F[j] = F_j
    end
    F = hcat(F_j...)
    return F
end

I’m not an expert at addressing false sharing issues, so perhaps there is a better way.

Let’s try to setup a minimum working example (MWE) so others can actually try to execute something. As others have said, we probably need to know more about propagate_splitting. For now, I’ll just substitute sleep in:

julia> begin
           const V0 = 1.5
           const N, M = 22, 13
           propagate_splitting(t_r, V0) = (sleep(1); t_r + V0)
           
           potential_depth_values = rand(N)
           separation_time_values = rand(M)
       end;