Hi Guys,
I am running a model which use a for-loop to run on 17 independently different sites. Each site does not share memory with each other. But all of the inputs for 17 sites are collected in a list, of which the elements are static array. then I was trying to use Threads.@threads to parallelize the for-loop. I am using exactly 17 threads to do it. However, I found quite a different performance between single run and parallelization run. Here is the script:
function parallelizeTEM!(space_selected_models::Vector, space_forcing, space_spinup_forcing, loc_forcing_t, space_output, space_land, tem_info, ::ThreadsParallelization)
Threads.@threads for space_index ∈ eachindex(space_forcing)
if haskey(tem_info, :use_space_spinup_sequence) && tem_info.use_space_spinup_sequence
@time coreTEM!(space_selected_models[space_index], space_forcing[space_index], space_spinup_forcing[space_index], loc_forcing_t, space_output[space_index], space_land[space_index], tem_info, tem_info.space_spinup_sequence[space_index])
end
end
return nothing
end
Then
julia> Threads.nthreads()
17
julia> space_index
1
julia> @btime coreTEM!($space_selected_models[space_index], $space_forcing[space_index], $space_spinup_forcing[space_index], $loc_forcing_t, $space_output[space_index], $space_land[space_index], $tem_info, $tem_info.space_spinup_sequence[space_index]);
21.492 s (31 allocations: 89.19 KiB)
julia> @time coreTEM!(space_selected_models[space_index], space_forcing[space_index], space_spinup_forcing[space_index], loc_forcing_t, space_output[space_index], space_land[space_index], tem_info, tem_info.space_spinup_sequence[space_index]);
21.392444 seconds (29 allocations: 30.578 KiB)
julia> parallelizeTEM!(space_selected_models, space_forcing, space_spinup_forcing, loc_forcing_t, space_output, space_land, tem_info, tem_info.run.parallelization);
38.327439 seconds (459 allocations: 277.391 KiB)
40.077714 seconds (626 allocations: 348.469 KiB)
40.396848 seconds (354 allocations: 267.891 KiB)
40.542786 seconds (993 allocations: 436.422 KiB)
41.710067 seconds (665 allocations: 340.641 KiB)
42.730745 seconds (843 allocations: 378.094 KiB)
44.097149 seconds (585 allocations: 314.688 KiB)
47.181715 seconds (468 allocations: 280.781 KiB)
47.940587 seconds (1.30 k allocations: 504.531 KiB)
48.239538 seconds (1.04 k allocations: 429.523 KiB)
48.585902 seconds (987 allocations: 422.234 KiB)
48.884628 seconds (723 allocations: 352.719 KiB)
49.047153 seconds (849 allocations: 376.125 KiB)
49.416455 seconds (1.30 k allocations: 507.266 KiB)
52.010567 seconds (1.48 k allocations: 545.211 KiB)
53.424446 seconds (1.34 k allocations: 522.047 KiB)
53.869446 seconds (1.50 k allocations: 547.492 KiB)
julia> for space_index in 1:17
println(space_index)
@btime coreTEM!($space_selected_models[space_index], $space_forcing[space_index], $space_spinup_forcing[space_index], $loc_forcing_t, $space_output[space_index], $space_land[space_index], $tem_info, $tem_info.space_spinup_sequence[space_index]);
end
1
21.164 s (31 allocations: 89.19 KiB)
2
22.174 s (31 allocations: 89.19 KiB)
3
20.878 s (31 allocations: 89.19 KiB)
4
21.739 s (31 allocations: 89.19 KiB)
5
21.431 s (31 allocations: 89.19 KiB)
6
21.357 s (31 allocations: 89.19 KiB)
7
21.552 s (31 allocations: 89.19 KiB)
...
You could see that the first of the for-loop only takes 21 seconds…why is that? and how to improve the performance? Thanks!