Hello all,
I am given a stochastic simulator and would like to turn it into n
deterministic simulators where each operates with a fixed seed. The simulators depend on some parameters that the user can specify.
My question is about the best practice so that I may use the deterministic simulators in parallel (e.g. in a distributed environment or as different threads) without running into issues. Constraints are that I am not allowed to change the given stochastic simulator. I will also have limited ability to inspect the internals of the provided stochastic simulator.
As a simple working example, I am given the following stochastic simulator
function external_simulator(mu=0, var=1)
return mu+sqrt(var)*randn()
end
that implements a Gaussian random variable with mean mu
and variance var
. I am not allowed to change it.
To turn the simulator into a set of deterministic simulators, mathematically, I would draw n_i from a standard normal via randn()
and then keep it fixed so that the deterministic simulators would be g_i(\mu, v) = \mu+\sqrt(v) n_i, for i=1,..., N where N is the total number of deterministic simulators.
Since I am not allowed to change the provided stochastic simulator, I currently use the following wrapper (closure):
using Random
function make_deterministic(usersim, id)
function mysim(params)
Random.seed!(id)
return usersim(params...)
end
return mysim
end
where id
is some identifier (number).
I then use make_deterministic
to generate multiple deterministic simulators as follows:
sim1 = make_deterministic(external_simulator, 1);
sim2 = make_deterministic(external_simulator, 2);
# ...
sim100 = make_deterministic(external_simulator, 100);
Calling sim1
for different means
[sim1((mu,1)) for mu in 1:10]
gives, for example,
0.9294168610461021 1.929416861046102 2.929416861046102 3.929416861046102 4.929416861046102 5.929416861046102 6.929416861046102 7.929416861046102 8.929416861046102 9.929416861046102
which displays the desired behavior: the mean increases while the random bit (after the .) stays the same.
My questions are
- Is such an approach safe when used (a) serially on the same computer, (b) as threads on the same computer, (c) in a distributed environment (multiple compute nodes without shared memory)?
- Are there any restrictions on how I should define the
id
or can it be anyInt
? - Are there any best practices on how to choose the
id
?
Many thanks for your help!