Multithreading and Monte Carlo in 1.7

I have an elementary Monte Carlo computation that I want to do that amounts to:

results = zeros(n_trials);
for i in 1:n_trials
    results[i] = # relatively expensive Monte Carlo function
end

I’m finding that the statistics of the data in results are different when this is executed serially (as above) than if it’s executed as Threads.@threads for loop. I know there was a change in how random numbers were handled in 1.7, and I’m just wondering how to correctly accomplish this.

I’m fairly certain it’s a thread issue, as, if I write the problem using Distributed with an @sync @distributed for loop, I get statistically consistent results as in the serial case.

I think question requires more details like whether you were explicitly passing any RNG do your Monte Carlo function or the evaluation was dependent on GLOBAL_RNG state?

There’s no explicit passage or invocation of the RNG within the function.

Here’s an elementary version of what I am doing:

using Statistics
using BasicMD
using TestLandscapes
using ForwardDiff
using Random
using Printf
using HypothesisTests

V = x -> SymmetricDoubleWell(x[1]);

Δt = 1e-2 # time step
T = 10^2;
nΔt = Int(T / Δt);
β = 5.0;    # inverse temperature

cfg = ForwardDiff.GradientConfig(V, zeros(Float64, 1));
∇V! = (gradV, X) -> ForwardDiff.gradient!(gradV, V, X, cfg);

sampler = EM(∇V!, β, Δt);
x0 = Float64[-1.]

opts = MDOptions(n_iters = nΔt);


function f(X)
    return Float64(X[1] > -0.1 && X[1] < 0.1)
end

Random.seed!(100)
n_trials = 10^4;
results = zeros(n_trials);
for j in 1:n_trials
    obs_vals = sample_observables(x0, sampler, (f,), options=opts)[:];
    results[j] = mean(obs_vals);
end
confint(OneSampleTTest(results))

Random.seed!(200)
n_trials = 10^4;
results = zeros(n_trials);
Threads.@threads for j in 1:n_trials
    obs_vals = sample_observables(x0, sampler, (f,), options=opts)[:];
    results[j] = mean(obs_vals);
end
confint(OneSampleTTest(results))

It returns:

(0.0016863222849335538, 0.0017651777150664454) # serial loop
(0.004717963473953854, 0.004919976526046147) # threaded loop

That these two confidence intervals are non-overlapping strikes me as problematic.

You seem to be initializing the RNG to 100 and 200 for the two runs.
Does using the same init value change the results?

This is bizarre, but I’m now getting a different answer for the threaded loop that’s consistent with the serial one. To answer your question, changing seeds does change the result but in ways that are not too far off.

As a follow up, I find that as soon as I have more than one thread running, there is no reproducibility of the threaded loop, despite setting the seed explicitly. I’ve verified this on three machines (two intel macs and an intel linux) all running 1.7.1.

Are you sure this sample_observable function does not modify the input variables? If so, you may want to start them inside the for loop for each trial.

Yeah, just to make sure, I switched both inner loops to be:

    x0_ = deepcopy(x0)
    sampler_ = deepcopy(sampler);
    f_ = deepcopy(f);
    opts_ = deepcopy(opts);
    obs_vals = sample_observables(x0_, sampler_, (f_,), options = opts_)[:]
    results[j] = mean(obs_vals)

and its got all the same problems.

The task RNGs seeded from RandomDevice which is not very performant. So the tasks end up with same seeds, and your random calls produce correlated input.

julia> n = 10                                                                          
julia> z = zeros(UInt64, n, 4);                                                                                                         
julia> Threads.@threads for i in 1:n
    z[i,1] = current_task().rngState0
    z[i,2] = current_task().rngState1
    z[i,3] = current_task().rngState2
    z[i,4] = current_task().rngState3
end

julia> z
10x4 Matrix{UInt64}:
 0xb5a7557e2fdbff28  0x09db5a1abfd86d1b  0xf96e830a211194df  0x31e7c2e8c4c98f4b
 0xb5a7557e2fdbff28  0x09db5a1abfd86d1b  0xf96e830a211194df  0x31e7c2e8c4c98f4b
 0xb5a7557e2fdbff28  0x09db5a1abfd86d1b  0xf96e830a211194df  0x31e7c2e8c4c98f4b
 0x2ca707c0d3b0bc36  0x6e9628ceb842c885  0x1395cbd4a3f59c0e  0xc977534c56a8f9f5
 0x2ca707c0d3b0bc36  0x6e9628ceb842c885  0x1395cbd4a3f59c0e  0xc977534c56a8f9f5
 0x2ca707c0d3b0bc36  0x6e9628ceb842c885  0x1395cbd4a3f59c0e  0xc977534c56a8f9f5
 0x46f1b23c9ca15399  0xf4dcd5ebcc1bba76  0x0b938b850dd2f860  0x01507390a13082df
 0x46f1b23c9ca15399  0xf4dcd5ebcc1bba76  0x0b938b850dd2f860  0x01507390a13082df
 0x414ebc7669ce613e  0xae1cacc9bacbc959  0x32ddb1fc917c932a  0x7ec97f6600242766
 0x414ebc7669ce613e  0xae1cacc9bacbc959  0x32ddb1fc917c932a  0x7ec97f6600242766

Try to manually seed the RNG within the loop.

julia> Threads.@threads for i in 1:n
    Random.seed!(i)
    z[i,1] = current_task().rngState0
    z[i,2] = current_task().rngState1
    z[i,3] = current_task().rngState2
    z[i,4] = current_task().rngState3
end

julia> z
10x4 Matrix{UInt64}:
 0xfff0241072ddab67  0xc53bc12f4c3f0b4e  0x56d451780b2dd4ba  0x50a4aa153d208dd8
 0xb0a79775455db226  0x10dd66f620963f46  0x7c9605a573432caa  0xce6e2d2a92708d7c
 0x62bea62705299f9d  0xb2196eb285598f6a  0x813136b07248b437  0x783171c16f41f41d
 0x4994fc2524515efb  0x1ee7eb6959c96e31  0x1ed633b8ba6d572d  0x02ee70fd30935b2a
 0xc3b1bf2ea9b69425  0xfb15c0017deb2d31  0xb5c67bbde9fbe995  0x10b9c73e8107af27
 0xa96d50024acaa87a  0x6fb77896888f3d13  0xdb2fe2025de46c71  0xf8efa0565ea1707b
 0xfef9c95b5a3f61e8  0x9db6807c8e2aa3ed  0x2317a9b6478e87d4  0x2b6a6b2384eb15fb
 0x3de06eb0605676dc  0x8c7e955bcad76fd1  0x8cf24a2cac615180  0xa12c43abb20a105a
 0x31c0fdb77e6b079f  0xf8bbbedb8c20d31c  0x1c19355ea0d34d01  0x402b368a357b9496
 0x06d7db06b9e25d07  0xe8be35b7ca08a06d  0xa579356054013796  0x45bd4585b8f92201

I was always under the impression that setting the seed for each case was “bad form.”

So this version of the problem:

using Statistics
using Random
using Printf
using HypothesisTests

Δt = 1e-2 # time step
T = 10^2;
nΔt = Int(T / Δt);
β = 5.0;    # inverse temperature
∇V = x-> 4 * x * (x^2-1);
x0 = -1.;

function f(X)
    return Float64(X > -0.1 && X < 0.1)
end


function EM_sampler(x0, β, Δt, nΔt)
    x = x0;
    obs = 0.;
    for i in 1:nΔt
        x+= -∇V(x)*Δt + sqrt(2/β*Δt) * randn();
        obs +=f(x)
    end
    obs /=nΔt;
    return obs
end

Random.seed!(100)
n_trials = 10^4;
results = zeros(n_trials);
for j in 1:n_trials
    results[j] = EM_sampler(x0, β, Δt, nΔt);
end
@show confint(OneSampleTTest(results));

Random.seed!(200)
n_trials = 10^4;
results = zeros(n_trials);
Threads.@threads for j in 1:n_trials
    # Random.seed!(j)
    results[j] = EM_sampler(x0, β, Δt, nΔt)
end
@show confint(OneSampleTTest(results));

Behaves as expected, including full reproducibility across repeated runs. Any suggestions on how to troubleshoot the problem?

I narrowed down the issue to the fact that the structure

cfg = ForwardDiff.GradientConfig(...)

is not thread safe, and there are obvious ways to get around that in my code.