I have an elementary Monte Carlo computation that I want to do that amounts to:

results = zeros(n_trials);
for i in 1:n_trials
results[i] = # relatively expensive Monte Carlo function
end

I’m finding that the statistics of the data in results are different when this is executed serially (as above) than if it’s executed as Threads.@threads for loop. I know there was a change in how random numbers were handled in 1.7, and I’m just wondering how to correctly accomplish this.

I’m fairly certain it’s a thread issue, as, if I write the problem using Distributed with an @sync @distributed for loop, I get statistically consistent results as in the serial case.

I think question requires more details like whether you were explicitly passing any RNG do your Monte Carlo function or the evaluation was dependent on GLOBAL_RNG state?

This is bizarre, but I’m now getting a different answer for the threaded loop that’s consistent with the serial one. To answer your question, changing seeds does change the result but in ways that are not too far off.

As a follow up, I find that as soon as I have more than one thread running, there is no reproducibility of the threaded loop, despite setting the seed explicitly. I’ve verified this on three machines (two intel macs and an intel linux) all running 1.7.1.

Are you sure this sample_observable function does not modify the input variables? If so, you may want to start them inside the for loop for each trial.

The task RNGs seeded from RandomDevice which is not very performant. So the tasks end up with same seeds, and your random calls produce correlated input.

julia> n = 10
julia> z = zeros(UInt64, n, 4);
julia> Threads.@threads for i in 1:n
z[i,1] = current_task().rngState0
z[i,2] = current_task().rngState1
z[i,3] = current_task().rngState2
z[i,4] = current_task().rngState3
end
julia> z
10x4 Matrix{UInt64}:
0xb5a7557e2fdbff28 0x09db5a1abfd86d1b 0xf96e830a211194df 0x31e7c2e8c4c98f4b
0xb5a7557e2fdbff28 0x09db5a1abfd86d1b 0xf96e830a211194df 0x31e7c2e8c4c98f4b
0xb5a7557e2fdbff28 0x09db5a1abfd86d1b 0xf96e830a211194df 0x31e7c2e8c4c98f4b
0x2ca707c0d3b0bc36 0x6e9628ceb842c885 0x1395cbd4a3f59c0e 0xc977534c56a8f9f5
0x2ca707c0d3b0bc36 0x6e9628ceb842c885 0x1395cbd4a3f59c0e 0xc977534c56a8f9f5
0x2ca707c0d3b0bc36 0x6e9628ceb842c885 0x1395cbd4a3f59c0e 0xc977534c56a8f9f5
0x46f1b23c9ca15399 0xf4dcd5ebcc1bba76 0x0b938b850dd2f860 0x01507390a13082df
0x46f1b23c9ca15399 0xf4dcd5ebcc1bba76 0x0b938b850dd2f860 0x01507390a13082df
0x414ebc7669ce613e 0xae1cacc9bacbc959 0x32ddb1fc917c932a 0x7ec97f6600242766
0x414ebc7669ce613e 0xae1cacc9bacbc959 0x32ddb1fc917c932a 0x7ec97f6600242766

using Statistics
using Random
using Printf
using HypothesisTests
Δt = 1e-2 # time step
T = 10^2;
nΔt = Int(T / Δt);
β = 5.0; # inverse temperature
∇V = x-> 4 * x * (x^2-1);
x0 = -1.;
function f(X)
return Float64(X > -0.1 && X < 0.1)
end
function EM_sampler(x0, β, Δt, nΔt)
x = x0;
obs = 0.;
for i in 1:nΔt
x+= -∇V(x)*Δt + sqrt(2/β*Δt) * randn();
obs +=f(x)
end
obs /=nΔt;
return obs
end
Random.seed!(100)
n_trials = 10^4;
results = zeros(n_trials);
for j in 1:n_trials
results[j] = EM_sampler(x0, β, Δt, nΔt);
end
@show confint(OneSampleTTest(results));
Random.seed!(200)
n_trials = 10^4;
results = zeros(n_trials);
Threads.@threads for j in 1:n_trials
# Random.seed!(j)
results[j] = EM_sampler(x0, β, Δt, nΔt)
end
@show confint(OneSampleTTest(results));

Behaves as expected, including full reproducibility across repeated runs. Any suggestions on how to troubleshoot the problem?