Random numbers and threads

How to use random numbers in conjunction with threads? In the past I used the following pattern:

using Random
function simulation(rngs)
    Threads.@threads for i in 1:1000000
        rng = rngs[Threads.threadid()]
        x = rand(rng)
        # do more stuff with rng
    end
end

baserng = MersenneTwister(0)
rngs = [Random.jump(baserng, i*10^15) for i in 1:Threads.nthreads()]
simulation(rngs)

However it is my understanding, that this is unsafe. As I understand julia is free start executing a task on one thread and continue execution on another thread. For instance julia could start two tasks on thread1, pause and finish them on thread2 and thread3. In this case thread2 and thread3 would both access rngs[1] and cause havoc.

  • Is my understanding correct?
  • As of julia 1.7 can the described corruption actually occur? Or may it only happen with future julia versions?
  • What would be a better pattern here?
2 Likes

I seem to remember that

help?> rand()
  rand([rng=GLOBAL_RNG], [S], [dims...])

has a generator argument. I believe you should be on the safe side using thread local generator objects then?

You can just use the default RNG and it is thread safe. In Julia 1.7, it is not only thread-safe but is reproducible independent of the thread scheduling: Julia 1.7 Highlights

7 Likes

But what about the overhead associated to the concurrent access to the global random state (if that matters in the specific application). For example:

julia> import Random
       function simulation_default(nthreads)
           s = zeros(nthreads)
           Threads.@threads for it in 1:nthreads
               for j in it:nthreads:10^6 # easy splitter
                   s[it] += rand()
               end
           end
           sum(s)
       end
simulation_default (generic function with 1 method)

julia> import Random
       function simulation_perthread(nthreads)
           s = zeros(nthreads)
           Threads.@threads for it in 1:nthreads
               rng = Random.MersenneTwister()
               for j in it:nthreads:10^6 # easy splitter
                   s[it] += rand(rng)
               end
           end
           sum(s)
       end
simulation_perthread (generic function with 1 method)

julia> @btime simulation_default(4);
  3.945 ms (22 allocations: 2.06 KiB)

julia> @btime simulation_perthread(4);
  2.405 ms (66 allocations: 80.25 KiB)

(ps: note that in this particular example running without threading at all is faster than any of the alternatives)

2 Likes

Ah thanks for pointing that out! I would prefer however if I could launch my simulations without having side effects on the global rng. Is that possible?

The pattern I’ve used above should be fine. For something more elegant, see: Sum result of Threads.foreach() - #10 by tkf

Thanks, I guess that solves my MWE. However for my real application, I need something more composable. E.g. there are multiple Theads.@threads loops involving rand scattered over many functions. What I would like is a thread safe way to pass around rng state.

There isn’t a global random state. There’s a per-task random state used by default, if I understand correctly.

In general, now that Julia 1.7 implemented task migration between threads, it is problematic to use threadid to emulate thread-local state, and one should instead use task-local state.

2 Likes

Yes, because the “global” rng is actually a task-local state.

Uhm… so what explains the best performance of simulation_perthread above?

I think this is a safe way:

julia> using Random

julia> ntasks = 3 # not necessarily equal to the number of threads
3

julia> const rngs = [ MersenneTwister() for i in 1:ntasks ]
3-element Vector{MersenneTwister}:
 MersenneTwister(0x38c65f20ec9d5c7eccf63a103295d3c)
 MersenneTwister(0xfd4719ad02ca867d1ba23e9810767b28)
 MersenneTwister(0x8ce6e5b0de492eddd0b904d019c2cd1d)

julia> function simulation_pertask(ntasks)
           s = zeros(ntasks)
           Threads.@threads for it in 1:ntasks
               rng = rngs[it]
               for j in it:ntasks:10^6 
                   s[it] += rand(rng)
               end
           end
           sum(s)
       end
simulation_pertask (generic function with 1 method)

julia> @btime simulation_pertask(3);
  2.467 ms (22 allocations: 2.05 KiB)

Note that threadid() is never used, and the tasks have a unique index it, which is actually independent on the actual number of threads run.

1 Like
using Random
Random.seed!(0)
@show rand()
Random.seed!(0)
@show rand()
function doit()
    Threads.@threads for i in 1:100
        rand()
    end
end
Random.seed!(0)
doit()
@show rand()

gives me

rand() = 0.4056994708920292
rand() = 0.4056994708920292
rand() = 0.08910091331143466

So there seems to be a side effect?

That is a cool trick. It may create load balancing headaches however.

This is not even the same rng as the default (Xoshiro).

2 Likes

Threads.@threads is still allowed run things in the main task (and it probably does — that task is already running, so why shouldn’t it use it for one of the threads?).

Yes, true, that changed as well in 1.7. Thanks (Xoshiro is slower in that minimal test, effectively).

The trick some time ago was doing something like this:

RNG = [randjump(Random.MersenneTwister(options.seed),big(10)^20)] 
foreach(_ -> push!(RNG, randjump(last(RNG),big(10)^20)), 2:Threads.nthreads())

I don’t know if this is still recommended in any case. It is in one of my packages, though, where the parallel use of rand was a bottleneck.

edit: Though I think that this was to avoid the overlap of the sequences. I can’t find the thread exactly where I got that from.

I am confused by these two answers, they seem contradicting to me. So how to do thread safe rng without side effect on the global rng?

The task local RNG is seeded with a random number from the RNG of the current task – scheduling creating TaskB from TaskA therefore advances the RNG of TaskA.

3 Likes

Ok I think I get it. If I spawn things in my main task, its rng state changes. If I spawn things in another task, that other tasks rng state changes, but the main task rng state is untouched.

using Random

println("no spawning")
Random.seed!(0)
@show rand()
println("no spawning")
Random.seed!(0)
@show rand()
println("spawning once from main")
Random.seed!(0)
@sync begin
    Threads.@spawn nothing
end
@show rand()

println("spawning once from main")
Random.seed!(0)
@sync begin
    Threads.@spawn nothing
end
@show rand()
println("spawning twice from main")
Random.seed!(0)
@sync begin
    Threads.@spawn nothing
    Threads.@spawn nothing
end
@show rand()

println("spawning once from child")
task = Task() do 
    Threads.@spawn nothing
    rand()
end
Random.seed!(0)
@sync schedule(task)

@show rand()
no spawning
rand() = 0.4056994708920292
no spawning
rand() = 0.4056994708920292
spawning once from main
rand() = 0.6616126907308237
spawning once from main
rand() = 0.6616126907308237
spawning twice from main
rand() = 0.2895098423219379
spawning once from child
rand() = 0.4056994708920292
0.4056994708920292

Maybe I misinterpreted a point here. You can tune the load balancing in this pattern by setting the number of tasks (or the task size). That is actually a good way to control how the parallel code runs, depending on the problem. And you can (with current Julia) emulate the future behavior of @threads using @spawn.

For example, here I create a function which may have a bad load balancing, because each iteration sleeps for a fraction of a second:

julia> function test(lapses,ntasks)
           s = zeros(ntasks)
           Threads.@sync for it in 1:ntasks
               Threads.@spawn for i in it:ntasks:length(lapses)
                   sleep(lapses[i])
                   s[it] += lapses[i]
               end
           end
           sum(s)
       end
test (generic function with 1 method)

julia> lapses = [ 1e-3*i for j in 1:50 for i in 1:4 ];

julia> sum(lapses) # total sleep time
0.5000000000000001

julia> @btime test($lapses,1) # one task
  756.380 ms (1027 allocations: 31.58 KiB)
0.5000000000000003

julia> @btime test($lapses,4) # ntasks == nthreads
  258.188 ms (1046 allocations: 33.19 KiB)
0.5000000000000003

julia> @btime test($lapses,20) # ntasks >> nthreads
  50.320 ms (1141 allocations: 42.12 KiB)
0.5

Thus, if the tasks are very heterogeneous, you can improve balancing by controlling the number of tasks. Ideally, the code can setup some optimal task size depending on the input given, taking into consideration the problem characteristics, avoiding both excessive or insufficient task spawning, both which can be detrimental for performance.

Anyway, this may be offtopic, but who knows.

2 Likes