I think the poor scaling that you’re seeing is because the default global rng can suffer from poor performance when called from multiple threads.
IIRC, the reason was due to cache invalidation (the default global rng for each thread is stored next to the others in an array without any padding, so when one thread changes the rng state, it forces the cache on all other cores to reload… or something like that.)
[me@redmi ~]$ time ~/julia/bin/julia -t2 /tmp/error.jl
2
[DateTime("2020-08-09T15:24:23.878")]
2.499996828176266e9
real 0m20.020s
user 0m39.258s
sys 0m0.802s
[me@redmi ~]$ time ~/julia/bin/julia --inline=yes --optimize=3 --math-mode=fast --check-bounds=no -t8 /tmp/error.jl
8
[DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298"), DateTime("2020-08-09T15:24:54.298")]
2.500007301917459e9
2.500005161541428e9
2.4999788758251624e9
2.500002200890986e9
2.500027065633972e9
2.5000236820377455e9
2.499984928049075e9
real 0m39.881s
user 4m15.892s
sys 0m0.590s
[me@redmi ~]$ vim /tmp/error.jl # I made the code change below to explicit rng
[me@redmi ~]$ time ~/julia/bin/julia -t8 /tmp/error.jl
8
[DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239"), DateTime("2020-08-09T15:28:19.239")]
2.500025329225046e9
2.499977304118077e9
2.5000150674239006e9
2.4999986608286796e9
2.5000115835831914e9
2.4999994211369123e9
2.500023322898655e9
real 0m19.713s
user 2m5.908s
sys 0m0.546s
using Dates
import Random
function f(i)
rng = Random.MersenneTwister(i)
s = 0.
for i in 1:(5*10^9)
s += rand(rng)
end
return s
end
function t()
dates = Vector{DateTime}(undef,Threads.nthreads()-1)
task = Vector{Task}(undef,Threads.nthreads()-1)
for i in 1:Threads.nthreads()-1
task[i] = Threads.@spawn f(i)
dates[i] = Dates.now()
end
println(dates)
for i in 1:Threads.nthreads()-1
println(fetch(task[i]))
end
end
println(Threads.nthreads())
t()