Future.randjump peculiar speed

At my company we typically break up the simulations into predefined fixed “chunks”, and assign a seed to each chunk this way the dynamic scheduler in the threading is repeatable no matter the number of threads.

I have been using Future.randjump to generate a new seed/state for each chunk. After experimenting a bit, it seems that the randjump is pretty slow for the default jump big(10)^20. I think this is supposed to be predefined.I thought by using this jump amount randjump would be faster.

In fact, I noticed no change in runtime for generating, 10 seeds, if I used the default of big(10)^20 or not.

Any thoughts on how to speed up randjump or or why the default isn’t faster? In my “real world” code, I am calling randjump in separate threads to speed things up, but it is actually slower than my simulation.

using Future, Random

seed = 1234
nchunks = 10
jumpamt = big(10)^20

a = fill(MersenneTwister(),nchunks)

m = MersenneTwister(seed)
a[1] = m

for i = 2:nchunks
 @inbounds   a[i] = Future.randjump(m,jumpamt)
end 

Looking at calc_jump (called via randjump)

it looks like any steps is cached. So I guess you don’t see the difference if you use the same steps more than two times.

To be fair, the computation cached by calc_jump does seem to be much faster for big(10)^20 than other values, compared to the rest of what Future.randjump does:

julia> @btime Future.randjump(m, big(10)^20) setup=(m = MersenneTwister(0));
  14.303 ms (17 allocations: 22.77 KiB)

julia> @btime Random.DSFMT.GF2X(Random.DSFMT.JPOLY1e20);
  17.986 μs (5 allocations: 7.52 KiB)

julia> @btime Random.DSFMT.powxmod(big(10)^20 + 1, Random.DSFMT.CharPoly());
  92.881 ms (17266 allocations: 12.34 MiB)

By the way, I just realized that calc_jump uses a bare Dict to cache the jump polynomials:

So, using randjump from different threads seems to be a bit dangerous, unless you call randjump first in the main thread.

Just a nitpick, but I guess you meant to write a[i-1] instead of m?