Why is `TaskLocalRNG` faster than `Xoshiro` with multiple threads?

robsmith11 · January 13, 2022, 9:19pm

I was a bit confused by the docs:

In a multi-threaded program, you should generally use different RNG objects from different threads or tasks in order to be thread-safe. However, the default RNG is thread-safe as of Julia 1.3 (using a per-thread RNG up to version 1.6, and per-task thereafter).

They recommend generally using explicit RNGs for each task, but don’t give a reason why. Since the default RNG is now thread-safe, what is the reason? Performance?

I ran a simple test on a 1.8 nightly of Julia and was also confused by the results. Using an explicit Xoshiro RNG is slower on average than an explicit TaskLocalRNG. (The explicit TaskLocalRNG performs the same as not passing any RNG and using the default.)

julia> function f(rng_call,  n)
         Threads.@threads for _ in 1:Threads.nthreads()
           rng = rng_call()
           sum(rand(rng) for _ in 1:n)
         end
       end
f (generic function with 1 method)

julia> Threads.nthreads()
8

julia> @benchmark f(Random.Xoshiro, 10^6)
BenchmarkTools.Trial: 2151 samples with 1 evaluation.
 Range (min … max):  981.468 μs … 15.590 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       2.306 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     2.313 ms ±  1.369 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▃▂▁▁   ▁█▆▃▅▁▁▂ ▁▄▃                                         ▁
  ████████████████████▆▄▅▄▆▃▄▃▁▃▃▃▁▃▃▁▃▁▄▁▁▄▄▄▁▃▃▃▁▄▄▃▃▄▄▃▄▃▁▄ █
  981 μs        Histogram: log(frequency) by time      9.15 ms <

 Memory estimate: 5.00 KiB, allocs estimate: 81.

julia> @benchmark f(Random.TaskLocalRNG, 10^6)
BenchmarkTools.Trial: 2902 samples with 1 evaluation.
 Range (min … max):  1.026 ms …  16.076 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.088 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.716 ms ± 998.001 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▁                         ▂▅▅▃▃   ▁▃▂▁                  ▁ ▁
  ███▆▅▃▃▄▄▄▄▁▃▄▅▃▃▃▄▁▁▁▁█▇▇▄▆████████████▅▆▆▃▄▅▆▆▇▄▁▃▆▅▆▆▇██ █
  1.03 ms      Histogram: log(frequency) by time      3.76 ms <

 Memory estimate: 4.50 KiB, allocs estimate: 65.

jling · January 14, 2022, 12:20am

seems like you answered your own question in the title?

https://github.com/JuliaLang/julia/pull/32407:

The global random number generator (GLOBAL_RNG) is now thread-safe (and thread-local) ([#32407]).

robsmith11 · January 14, 2022, 2:44am

My benchmark actually shows the opposite: the default TaskLocalRNG is faster than using a separate Xoshiro RNG for each task.

jling · January 14, 2022, 3:13am

it IS task-local

robsmith11 · January 14, 2022, 3:20am

Sorry, but I don’t follow your point.

I wasn’t suggesting that it wasn’t task-local. In my two examples, I was creating each RNG explicitly in each loop iteration so they should also be task local. Tasks are sticky by default, so there shouldn’t be any thread migration either. So I don’t see any reason why an explicit Xoshiro use should be slower.

jling · January 14, 2022, 3:38am

They are basically the same, and btw I don’t think TaskLocalRNG is faster, if you look at min time, explicit Xoshiro is faster

robsmith11 · January 14, 2022, 3:50am

They’re not the same. You can run the test for more iterations if you don’t believe those results are significant. The average runtime was always about 20% slower for Xoshiro on my system when using threads.

When running the tests using just a single thread, then Xoshiro is slightly faster, as expected. The main reason I posted this question is I’m wondering if there is something inefficient going on with threading that I’m not aware of.

robsmith11 · January 14, 2022, 3:58am

Hmm. Actually after rerunning on my server after killing all other active processes, I am unable to reproduce consistent differences. Sorry for the noise. Closing this.

Topic		Replies	Views
Random numbers and threads General Usage question , multithreading , random	30	3867	March 29, 2022
Why does rand() in threads slowdown speed in Julia 1.3 New to Julia question	7	899	December 30, 2019
Parallel Mersenne Twister Internals & Design question	18	2414	January 19, 2020
Thread Local Storage in Julia Internals & Design proposal	5	2062	June 20, 2018
Threads and stdlib Random General Usage	4	494	March 19, 2021

Why is `TaskLocalRNG` faster than `Xoshiro` with multiple threads?

Related topics