Hello,
I’m trying to increase the performances for my differential equation system, where the differential function is
function drho_dt(rho, p, t)
global L, L_t
A, w_l = p
return (L + A * sin(w_l * t) * L_t) * rho
end
with L
and L_t
two constant sparse matrices of sizes varying from 900x900 to 2000x2000 or more. Since they are constant, I previously declared them as global variables, speeding up the problem.
What I need is the different behaviors depending on the variation on the frequency w_l
.
I’m yet able to reproduce this problem using EnsembleThreads()
method, which parallelize all the ODEs into my 12 cores (with calculation times of about 1min for 50 trjectories).
Nevertheless, I can access to two other servers with 12 cores each, obtaining a total of 36 cores. Thus, I turned to the EnsembleDistributed()
method.
After programming the required code for the Distributed method, which can be summarized
using Distributed
addprocs(11; restrict=false)
addprocs([("test@host1", :auto)], tunnel=true, exename = "/home/test/julia-1.6.1/bin/julia", dir = "/home/test/alberto/Rabi Raman Scattering")
addprocs([("ezio@host2", 12)], tunnel=true, exename = "/home/ezio/julia-1.6.1/bin/julia", dir = "/home/ezio/alberto/Rabi Raman Scattering")
@everywhere include("MyQuantumModule.jl")
@everywhere using .MyQuantumModule
@everywhere using LinearAlgebra
@everywhere using DifferentialEquations
@everywhere using SparseArrays
I immediately thought about declaring L and L_t in all processors (@everywhere L = $L
), and than simply writing
@everywhere function drho_dt(rho, p, t)
global L, L_t
A, w_l = p
return (L + A * sin(w_l * t) * L_t) * rho
end
@everywhere function prob_func(prob,i,repeat)
remake(prob,p=[prob.p[1], w_l_l[i]])
end
p = [A, w_l]
tspan = (0.0, 15.0 / gam_c)
prob = ODEProblem(drho_dt, rho0_vec, tspan, p)
ensemble_prob = EnsembleProblem(prob, prob_func=prob_func, safetycopy=false)
@time sim = solve(ensemble_prob, BS3(), EnsembleDistributed(), trajectories=length(w_l_l))
but L
and L_t
are very large (despite being sparse), and it tooks a lot of time to pass that variables on all the remote processors.
My second plan was to obtain these two matrices indipendently on each processor
@everywhere begin
... Some code to get L and L_t, so
L = some stuff
L_t = some stuff
end
# The code written above is executed very quickly.
... and than the same code for the differential equation
With this method i can see all the processors working on all the servers with htop
command, however the solver is so much slower compared to the EnsembleThreads()
method. I tried with 25x25 matrices, the Threads method took about 8 seconds, while the Distributed 300!
Am I doing something wrong?