At my job it is very important to have our algorithms be as close to reproducible as possible on different systems, no matter the number of cores. Even though, in my mind, it shouldn’t matter, the powers that be want this.
In our c# code, we run simulations and pre-determine based on the specific algorithm we are running the number of iterations to batch together. I use partition to create N tasks (no matter the number of cores) with a fixed number of iterations in each core.
- set the number of blas threads to 1
- create n tasks, by using partition to split the iterations that are assigned to each task. in this case each tasks gets 100 iterations or less for the remainder.
- spawn the tasks (which should have its own task local rng)
- re-set the number of blas threads
Is the following the most appropriate way to accomplish this with version 1.7 or later of Julia? Thanks for any insights.
function simulator(input::MyStruct, niter)
Random.seed!(input.seed)
nt = BLAS.get_num_threads()
BLAS.set_num_threads(1)
# pre-allocate where to store the results
bigmat = Array{Float64}(undef, input.n, input.m, niter)
@sync for part in partition(1:niter, 100)
Threads.@spawn for j = 1:length(part)
bigmat[:, :, part[j]] .= rand(input)
end
end
BLAS.set_num_threads(nt)
return bigmat
end