Different running performance in parallelization of Threads

no, the data and output array are already loaded

1 Like

did it, but still 2 times of the thread performance…

quick updates…I found that I could make the @threads parellization has the same amount of time compared to single thread running on the log in node…not on the computation node by srun pty…

(on login node)
julia> @btime coreTEM!($space_selected_models, $space_forcing[space_index], $space_spinup_forcing[space_index], $loc_forcing_t, 
                   $space_output[space_index], $space_land[space_index], $tem_info)
  29.113 s (7 allocations: 51.33 KiB)

julia> @btime runTEM!($info.models.forward, $run_helpers.space_forcing, $run_helpers.space_spinup_forcing, $run_helpers.loc_forcing_t, 
                   $run_helpers.space_output, $run_helpers.space_land, $run_helpers.tem_info)
  23.544 s (88 allocations: 56.81 KiB)
(on srun node)
julia> @btime coreTEM!($space_selected_models, $space_forcing[space_index], $space_spinup_forcing[space_index], $loc_forcing_t, 
                          $space_output[space_index], $space_land[space_index], $tem_info)
  20.707 s (7 allocations: 51.33 KiB)

julia> @btime runTEM!($info.models.forward, $run_helpers.space_forcing, $run_helpers.space_spinup_forcing, $run_helpers.loc_forcing_t, 
                          $run_helpers.space_output, $run_helpers.space_land, $run_helpers.tem_info)
  39.177 s (88 allocations: 56.81 KiB)