Parallel Monte Carlo Simulations

I am simulating a large number of SDEs and looking at the results. Things starts to take quite a while and I figured that I had 8 cores on my computer, might as well try to use them. Unfortunate I cannot get things to work.
(I am running all of this in a JuPyter Notebook. The computer is a laptop with a good processor, lots of RAM and 8 cores)

First I have tried whenever I can use parallel computing at all. This is my code:
Adding the cores and loading the package

addprocs(7)
@everywhere using DifferentialEquations

preparing the functions for solving SDEs (in this case I more or less stuff from a stochastic lorenz from the docs)

@everywhere function lorenz(du,u,p,t)
  du[1] = 10.0(u[2]-u[1])
  du[2] = u[1]*(28.0-u[3]) - u[2]
  du[3] = u[1]*u[2] - (8/3)*u[3]
end
@everywhere function σ_lorenz(du,u,p,t)
  du[1,1] = 3.0
  du[1,2] = 0.0
  du[2,1] = 3.0
  du[2,2] = 3.0
  du[3,1] = 0.0
  du[3,2] = 3.0
end
@everywhere function get_solution()
    prob_sde_lorenz = SDEProblem(lorenz,σ_lorenz,[1.0,0.0,0.0],(0.0,100.0),noise_rate_prototype=zeros(3,2))
    sol = solve(prob_sde_lorenz,ImplicitEM(),dt=0.0005)
end

Next function for running with and without several cores.

function monte_single()
    sols = Vector{Any}(80)
    for i = 1:80
        sols[i] = get_solution()
    end
    return sols
end
function monte_multi()
    sols = Vector{Any}(80)
    @parallel (+) for i = 1:80
        sols[i] = get_solution()
    end
    return sols
end

Finally I try running it:

@time monte_single();
@time monte_multi();

I try a few times and it seems like the multi one does everything a few times faster than the single one. Using top in the terminal I can confirm that multiple Julia processes is being run. So far so good.
(they take about 30s and 6s, respectively)

Finally I try using the monte carlo tool provided by the DIffEq package:

prob_sde_lorenz = SDEProblem(lorenz,σ_lorenz,[1.0,0.0,0.0],(0.0,100.0),noise_rate_prototype=zeros(3,2))
monte_prob = MonteCarloProblem(prob_sde_lorenz)
@time sim = solve(monte_prob,ImplicitEM(),dt=0.0005,num_monte=80)

This is slower than the single one I did earlier (takes around 50s). I can also check my computer using top in the terminal. Most of the time a single Julia process is being run, but some (short) times only a single.

I have tried using several parallel_type options. Trying parallel_type = :threads seems to mess up my computer, making everything slow. The top window do not show anything special and only one Julia process. Finally I get something like

343.

Worker 9 terminated.

104533 seconds (92.12 M allocations: 4.801 GiB, 47.67% gc time)

Worker 13 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 14 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 11 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 7 terminated.ERROR (unhandled task failure): EOFError: read end of file

ERROR (unhandled task failure): EOFError: read end of file

MonteCarloSolution Solution of length 80 with uType:
DiffEqBase.RODESolution{Float64,2,Array{Array{Float64,1},1},Void,Void,Array{Float64,1},DiffEqNoiseProcess.NoiseProcess{Float64,2,Float64,Array{Float64,1},Void,Void,DiffEqNoiseProcess.#INPLACE_WHITE_NOISE_DIST,DiffEqNoiseProcess.#INPLACE_WHITE_NOISE_BRIDGE,true,DataStructures.Stack{Tuple{Float64,Array{Float64,1},Void}},ResettableStacks.ResettableStack{Tuple{Float64,Array{Float64,1},Void}},DiffEqNoiseProcess.RSWM{:RSwM1,Float64},RandomNumbers.Xorshifts.Xoroshiro128Plus},DiffEqBase.SDEProblem{Array{Float64,1},Float64,true,Void,Void,Void,#lorenz,#σ_lorenz,Void,UniformScaling{Int64},Array{Float64,2}},StochasticDiffEq.ImplicitEM{0,true,DiffEqBase.LinSolveFactorize{Base.LinAlg.#lufact!},DataType,Void,Void,Float64,:Predictive},StochasticDiffEq.LinearInterpolationData{Array{Array{Float64,1},1},Array{Float64,1}}}

Which do not seem that successful. Next I try (with no more success) parallel_type = :parfor. Top do show several active julia processes, the computer does turn sloggish when running. The result is something like:

Worker 8 terminated.ERROR (unhandled task failure): ProcessExitedException()
Stacktrace:
 [1] try_yieldto(::Base.##296#297{Task}, ::Task) at ./event.jl:189
 [2] wait() at ./event.jl:234
 [3] wait(::Condition) at ./event.jl:27
 [4] wait_impl(::Channel{Any}) at ./channels.jl:364
 [5] wait(::Channel{Any}) at ./channels.jl:360
 [6] take_buffered at ./channels.jl:319 [inlined]
 [7] take!(::Channel{Any}) at ./channels.jl:317
 [8] #remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:350
 [9] remotecall_fetch(::Function, ::Base.Distributed.Worker, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:346
 [10] #remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:367
 [11] remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:367
 [12] (::Base.Distributed.##155#156{Base.#vcat,DiffEqMonteCarlo.##5#8{DiffEqBase.MonteCarloProblem{DiffEqBase.SDEProblem{Array{Float64,1},Float64,true,Void,Void,Void,#lorenz,#σ_lorenz,Void,UniformScaling{Int64},Array{Float64,2}},DiffEqBase.##205#211,DiffEqBase.##204#210,DiffEqBase.##206#212,Array{Any,1}},StochasticDiffEq.ImplicitEM{0,true,DiffEqBase.LinSolveFactorize{Base.LinAlg.#lufact!},DataType,Void,Void,Float64,:Predictive},Tuple{Tuple{Symbol,Float64}}},UnitRange{Int64},Array{UnitRange{Int64},1}})() at ./distributed/macros.jl:144

Worker 3 terminated.ERROR (unhandled task failure): EOFError: read end of file
ERROR (unhandled task failure): ProcessExitedException()
Stacktrace:
 [1] try_yieldto(::Base.##296#297{Task}, ::Task) at ./event.jl:189
 [2] wait() at ./event.jl:234
 [3] wait(::Condition) at ./event.jl:27
 [4] wait_impl(::Channel{Any}) at ./channels.jl:364
 [5] wait(::Channel{Any}) at ./channels.jl:360
 [6] take_buffered at ./channels.jl:319 [inlined]
 [7] take!(::Channel{Any}) at ./channels.jl:317
 [8] #remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:350
 [9] remotecall_fetch(::Function, ::Base.Distributed.Worker, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:346
 [10] #remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:367
 [11] remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N} where N) at ./distributed/remotecall.jl:367
 [12] (::Base.Distributed.##155#156{Base.#vcat,DiffEqMonteCarlo.##5#8{DiffEqBase.MonteCarloProblem{DiffEqBase.SDEProblem{Array{Float64,1},Float64,true,Void,Void,Void,#lorenz,#σ_lorenz,Void,UniformScaling{Int64},Array{Float64,2}},DiffEqBase.##205#211,DiffEqBase.##204#210,DiffEqBase.##206#212,Array{Any,1}},StochasticDiffEq.ImplicitEM{0,true,DiffEqBase.LinSolveFactorize{Base.LinAlg.#lufact!},DataType,Void,Void,Float64,:Predictive},Tuple{Tuple{Symbol,Float64}}},UnitRange{Int64},Array{UnitRange{Int64},1}})() at ./distributed/macros.jl:144

ERROR (unhandled task failure): EOFError: read end of file

Next we have parallel_type = :split_threads. Here top do only show a single process and the computer is more or less unaffected while running. However it keeps going for minutes without actually returning anything. The same happens for parallel_type = :none.

In the end this is also what happens for parallel_type = :pmap. This should also be the default case and have been the same result as in my first attempts. By this time I am confused enough not to be overly surprised. Admittedly this is no expertise of mine and I am unsure whenever this experimenting with various parallel_type options is actually helpful.

Something odd is going on though. My first example makes me believe that running in parallel works on the computer. I am also kind convinced that I should be able to use that to make my own parallel Monte Carlo functions. However it feels stupid to do that when a function have already been implemented by people I presume knows this much better then me. It should be better to use that one in some proper way.

4 Likes