How to use random numbers in conjunction with threads? In the past I used the following pattern:
using Random
function simulation(rngs)
Threads.@threads for i in 1:1000000
rng = rngs[Threads.threadid()]
x = rand(rng)
# do more stuff with rng
end
end
baserng = MersenneTwister(0)
rngs = [Random.jump(baserng, i*10^15) for i in 1:Threads.nthreads()]
simulation(rngs)
However it is my understanding, that this is unsafe. As I understand julia is free start executing a task on one thread and continue execution on another thread. For instance julia could start two tasks on thread1, pause and finish them on thread2 and thread3. In this case thread2 and thread3 would both access rngs[1] and cause havoc.
Is my understanding correct?
As of julia 1.7 can the described corruption actually occur? Or may it only happen with future julia versions?
You can just use the default RNG and it is thread safe. In Julia 1.7, it is not only thread-safe but is reproducible independent of the thread scheduling: Julia 1.7 Highlights
But what about the overhead associated to the concurrent access to the global random state (if that matters in the specific application). For example:
julia> import Random
function simulation_default(nthreads)
s = zeros(nthreads)
Threads.@threads for it in 1:nthreads
for j in it:nthreads:10^6 # easy splitter
s[it] += rand()
end
end
sum(s)
end
simulation_default (generic function with 1 method)
julia> import Random
function simulation_perthread(nthreads)
s = zeros(nthreads)
Threads.@threads for it in 1:nthreads
rng = Random.MersenneTwister()
for j in it:nthreads:10^6 # easy splitter
s[it] += rand(rng)
end
end
sum(s)
end
simulation_perthread (generic function with 1 method)
julia> @btime simulation_default(4);
3.945 ms (22 allocations: 2.06 KiB)
julia> @btime simulation_perthread(4);
2.405 ms (66 allocations: 80.25 KiB)
(ps: note that in this particular example running without threading at all is faster than any of the alternatives)
Ah thanks for pointing that out! I would prefer however if I could launch my simulations without having side effects on the global rng. Is that possible?
Thanks, I guess that solves my MWE. However for my real application, I need something more composable. E.g. there are multiple Theads.@threads loops involving rand scattered over many functions. What I would like is a thread safe way to pass around rng state.
Uhm… so what explains the best performance of simulation_perthread above?
I think this is a safe way:
julia> using Random
julia> ntasks = 3 # not necessarily equal to the number of threads
3
julia> const rngs = [ MersenneTwister() for i in 1:ntasks ]
3-element Vector{MersenneTwister}:
MersenneTwister(0x38c65f20ec9d5c7eccf63a103295d3c)
MersenneTwister(0xfd4719ad02ca867d1ba23e9810767b28)
MersenneTwister(0x8ce6e5b0de492eddd0b904d019c2cd1d)
julia> function simulation_pertask(ntasks)
s = zeros(ntasks)
Threads.@threads for it in 1:ntasks
rng = rngs[it]
for j in it:ntasks:10^6
s[it] += rand(rng)
end
end
sum(s)
end
simulation_pertask (generic function with 1 method)
julia> @btime simulation_pertask(3);
2.467 ms (22 allocations: 2.05 KiB)
Note that threadid() is never used, and the tasks have a unique index it, which is actually independent on the actual number of threads run.
using Random
Random.seed!(0)
@show rand()
Random.seed!(0)
@show rand()
function doit()
Threads.@threads for i in 1:100
rand()
end
end
Random.seed!(0)
doit()
@show rand()
Threads.@threads is still allowed run things in the main task (and it probably does — that task is already running, so why shouldn’t it use it for one of the threads?).
The task local RNG is seeded with a random number from the RNG of the current task – scheduling creating TaskB from TaskA therefore advances the RNG of TaskA.
Ok I think I get it. If I spawn things in my main task, its rng state changes. If I spawn things in another task, that other tasks rng state changes, but the main task rng state is untouched.
using Random
println("no spawning")
Random.seed!(0)
@show rand()
println("no spawning")
Random.seed!(0)
@show rand()
println("spawning once from main")
Random.seed!(0)
@sync begin
Threads.@spawn nothing
end
@show rand()
println("spawning once from main")
Random.seed!(0)
@sync begin
Threads.@spawn nothing
end
@show rand()
println("spawning twice from main")
Random.seed!(0)
@sync begin
Threads.@spawn nothing
Threads.@spawn nothing
end
@show rand()
println("spawning once from child")
task = Task() do
Threads.@spawn nothing
rand()
end
Random.seed!(0)
@sync schedule(task)
@show rand()
no spawning
rand() = 0.4056994708920292
no spawning
rand() = 0.4056994708920292
spawning once from main
rand() = 0.6616126907308237
spawning once from main
rand() = 0.6616126907308237
spawning twice from main
rand() = 0.2895098423219379
spawning once from child
rand() = 0.4056994708920292
0.4056994708920292
Maybe I misinterpreted a point here. You can tune the load balancing in this pattern by setting the number of tasks (or the task size). That is actually a good way to control how the parallel code runs, depending on the problem. And you can (with current Julia) emulate the future behavior of @threads using @spawn.
For example, here I create a function which may have a bad load balancing, because each iteration sleeps for a fraction of a second:
julia> function test(lapses,ntasks)
s = zeros(ntasks)
Threads.@sync for it in 1:ntasks
Threads.@spawn for i in it:ntasks:length(lapses)
sleep(lapses[i])
s[it] += lapses[i]
end
end
sum(s)
end
test (generic function with 1 method)
julia> lapses = [ 1e-3*i for j in 1:50 for i in 1:4 ];
julia> sum(lapses) # total sleep time
0.5000000000000001
julia> @btime test($lapses,1) # one task
756.380 ms (1027 allocations: 31.58 KiB)
0.5000000000000003
julia> @btime test($lapses,4) # ntasks == nthreads
258.188 ms (1046 allocations: 33.19 KiB)
0.5000000000000003
julia> @btime test($lapses,20) # ntasks >> nthreads
50.320 ms (1141 allocations: 42.12 KiB)
0.5
Thus, if the tasks are very heterogeneous, you can improve balancing by controlling the number of tasks. Ideally, the code can setup some optimal task size depending on the input given, taking into consideration the problem characteristics, avoiding both excessive or insufficient task spawning, both which can be detrimental for performance.