Reproducible simulations

At my job it is very important to have our algorithms be as close to reproducible as possible on different systems, no matter the number of cores. Even though, in my mind, it shouldn’t matter, the powers that be want this.

In our c# code, we run simulations and pre-determine based on the specific algorithm we are running the number of iterations to batch together. I use partition to create N tasks (no matter the number of cores) with a fixed number of iterations in each core.

  • set the number of blas threads to 1
  • create n tasks, by using partition to split the iterations that are assigned to each task. in this case each tasks gets 100 iterations or less for the remainder.
  • spawn the tasks (which should have its own task local rng)
  • re-set the number of blas threads

Is the following the most appropriate way to accomplish this with version 1.7 or later of Julia? Thanks for any insights.

function simulator(input::MyStruct, niter)
    Random.seed!(input.seed)

    nt = BLAS.get_num_threads()
    BLAS.set_num_threads(1)

    # pre-allocate where to store the results
    bigmat = Array{Float64}(undef, input.n, input.m, niter)

    @sync for part in partition(1:niter, 100)
        Threads.@spawn for j = 1:length(part)
            bigmat[:, :, part[j]] .= rand(input)
        end
    end
    BLAS.set_num_threads(nt)

    return bigmat
end

notice built-in random number sequence is subject to change between any versions of Julia

Thanks. I definitely understand that. For a given version of Julia, they would like to have the 40 core servers and 2 to 8 core laptops be as close as possible for the same number of simulations.

And for adding tests, I plan on figuring out how to adapt the code above to use the StableRNGs package and probably single threaded for the tests.

The code in the OP looks good to me. You don’t need to tweak BLAS.set_num_threads when you don’t use BLAS but I’d assume that the loop body is more complex and can contain BLAS operations in the actual code.

FYI, ThreadsX.jl and JuliaFolds packages support reproducible output unless you opt-out. It is independent of nthreads() when you specify the basesize option (something like the parameter value 100 in the OP).

2 Likes