I’m trying to increase the performances for my differential equation system, where the differential function is
function drho_dt(rho, p, t) global L, L_t A, w_l = p return (L + A * sin(w_l * t) * L_t) * rho end
L_t two constant sparse matrices of sizes varying from 900x900 to 2000x2000 or more. Since they are constant, I previously declared them as global variables, speeding up the problem.
What I need is the different behaviors depending on the variation on the frequency
I’m yet able to reproduce this problem using
EnsembleThreads() method, which parallelize all the ODEs into my 12 cores (with calculation times of about 1min for 50 trjectories).
Nevertheless, I can access to two other servers with 12 cores each, obtaining a total of 36 cores. Thus, I turned to the
After programming the required code for the Distributed method, which can be summarized
using Distributed addprocs(11; restrict=false) addprocs([("test@host1", :auto)], tunnel=true, exename = "/home/test/julia-1.6.1/bin/julia", dir = "/home/test/alberto/Rabi Raman Scattering") addprocs([("ezio@host2", 12)], tunnel=true, exename = "/home/ezio/julia-1.6.1/bin/julia", dir = "/home/ezio/alberto/Rabi Raman Scattering") @everywhere include("MyQuantumModule.jl") @everywhere using .MyQuantumModule @everywhere using LinearAlgebra @everywhere using DifferentialEquations @everywhere using SparseArrays
I immediately thought about declaring L and L_t in all processors (
@everywhere L = $L), and than simply writing
@everywhere function drho_dt(rho, p, t) global L, L_t A, w_l = p return (L + A * sin(w_l * t) * L_t) * rho end @everywhere function prob_func(prob,i,repeat) remake(prob,p=[prob.p, w_l_l[i]]) end p = [A, w_l] tspan = (0.0, 15.0 / gam_c) prob = ODEProblem(drho_dt, rho0_vec, tspan, p) ensemble_prob = EnsembleProblem(prob, prob_func=prob_func, safetycopy=false) @time sim = solve(ensemble_prob, BS3(), EnsembleDistributed(), trajectories=length(w_l_l))
L_t are very large (despite being sparse), and it tooks a lot of time to pass that variables on all the remote processors.
My second plan was to obtain these two matrices indipendently on each processor
@everywhere begin ... Some code to get L and L_t, so L = some stuff L_t = some stuff end # The code written above is executed very quickly. ... and than the same code for the differential equation
With this method i can see all the processors working on all the servers with
htop command, however the solver is so much slower compared to the
EnsembleThreads() method. I tried with 25x25 matrices, the Threads method took about 8 seconds, while the Distributed 300!
Am I doing something wrong?