Try this:
julia> using Threads; using Future: randjump
julia> const RNGS = [randjump(MersenneTwister(),big(10)^20)]; foreach(_ -> push!(RNGS, randjump(last(RNGS),big(10)^20)), 2:Threads.nthreads())
julia> function f3()
s = 0.
RNG = RNGS[Threads.threadid()]
for i in 1:10^8
s += rand(RNG)
end
return s
end
f3 (generic function with 1 method)
This is what I get:
f0: rand()
286.134 ms (112 allocations: 9.22 KiB)
f1: RNG inside f
131.979 ms (238 allocations: 359.38 KiB)
f2: using const RNG = ...
9.223 s (111 allocations: 9.19 KiB)
f3: using const RNGS = ...
132.231 ms (111 allocations: 9.19 KiB)
The cost of creating the MersenneTwister is amortized over the long 10^8 iterations. f3 should be the fastest with fewer iterations.
Using 10^3 iterations instead, I get this:
f0: rand()
22.492 μs (110 allocations: 9.16 KiB)
f1: RNG inside f
27.087 μs (236 allocations: 359.31 KiB)
f2: using const RNG = ...
78.150 μs (108 allocations: 9.09 KiB)
f3: using const RNGS = ...
21.343 μs (110 allocations: 9.16 KiB)