Scaling of @threads for "embarrassingly parallel" problem

mkitti · January 14, 2023, 6:13pm

My first instinct here is to modify your code as follows. The most important part is assigning t_r outside of the inner loop.

function map_fidelity2(potential_depth_values, separation_time_values; kwargs...)
    N = length(potential_depth_values)
    M = length(separation_time_values)
    F = zeros(N, M)
    Threads.@threads for j in eachindex(separation_time_values)
        @inbounds t_r = separation_time_values[j]
        @inbounds for i in eachindex(potential_depth_values)
            V0 = potential_depth_values[i]
            F[i, j] = propagate_splitting(t_r, V0; kwargs...)
        end
    end
    return F
end

There could be some false sharing issues here.

To address false sharing, we might try allocating an independent vector per task, and then assemble the matrix at the end.

function map_fidelity3(potential_depth_values, separation_time_values; kwargs...)
    N = length(potential_depth_values)
    M = length(separation_time_values)
    F = Vector{Vector{Float64}}(undef, M)
    Threads.@threads for j in eachindex(separation_time_values)
        @inbounds t_r = separation_time_values[j]
        F_j = zeros(N)
        @inbounds for i in eachindex(potential_depth_values)
            V0 = potential_depth_values[i]
            F_j[i] = propagate_splitting(t_r, V0; kwargs...)
        end
        F[j] = F_j
    end
    F = hcat(F_j...)
    return F
end

I’m not an expert at addressing false sharing issues, so perhaps there is a better way.

Let’s try to setup a minimum working example (MWE) so others can actually try to execute something. As others have said, we probably need to know more about propagate_splitting. For now, I’ll just substitute sleep in:

julia> begin
           const V0 = 1.5
           const N, M = 22, 13
           propagate_splitting(t_r, V0) = (sleep(1); t_r + V0)
           
           potential_depth_values = rand(N)
           separation_time_values = rand(M)
       end;

Topic		Replies	Views
Garbage collection and threading Performance memory-allocation	17	1937	December 20, 2023
Multithreaded memory usage General Usage	19	1677	July 14, 2023
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	3975	February 11, 2019
Multi-threading on a 2 CPU system New to Julia multithreading	15	1078	February 2, 2023
Julia multithreading is running slower than serial, can someone please explain why…? Performance multithreading , floops	14	705	March 24, 2023

Scaling of @threads for "embarrassingly parallel" problem

Related topics