Hello,

I’m trying to increase the performances for my differential equation system, where the differential function is

```
function drho_dt(rho, p, t)
global L, L_t
A, w_l = p
return (L + A * sin(w_l * t) * L_t) * rho
end
```

with `L`

and `L_t`

two constant sparse matrices of sizes varying from 900x900 to 2000x2000 or more. Since they are constant, I previously declared them as global variables, speeding up the problem.

What I need is the different behaviors depending on the variation on the frequency `w_l`

.

I’m yet able to reproduce this problem using `EnsembleThreads()`

method, which parallelize all the ODEs into my 12 cores (with calculation times of about 1min for 50 trjectories).

Nevertheless, I can access to two other servers with 12 cores each, obtaining a total of 36 cores. Thus, I turned to the `EnsembleDistributed()`

method.

After programming the required code for the Distributed method, which can be summarized

```
using Distributed
addprocs(11; restrict=false)
addprocs([("test@host1", :auto)], tunnel=true, exename = "/home/test/julia-1.6.1/bin/julia", dir = "/home/test/alberto/Rabi Raman Scattering")
addprocs([("ezio@host2", 12)], tunnel=true, exename = "/home/ezio/julia-1.6.1/bin/julia", dir = "/home/ezio/alberto/Rabi Raman Scattering")
@everywhere include("MyQuantumModule.jl")
@everywhere using .MyQuantumModule
@everywhere using LinearAlgebra
@everywhere using DifferentialEquations
@everywhere using SparseArrays
```

I immediately thought about declaring L and L_t in all processors (`@everywhere L = $L`

), and than simply writing

```
@everywhere function drho_dt(rho, p, t)
global L, L_t
A, w_l = p
return (L + A * sin(w_l * t) * L_t) * rho
end
@everywhere function prob_func(prob,i,repeat)
remake(prob,p=[prob.p[1], w_l_l[i]])
end
p = [A, w_l]
tspan = (0.0, 15.0 / gam_c)
prob = ODEProblem(drho_dt, rho0_vec, tspan, p)
ensemble_prob = EnsembleProblem(prob, prob_func=prob_func, safetycopy=false)
@time sim = solve(ensemble_prob, BS3(), EnsembleDistributed(), trajectories=length(w_l_l))
```

but `L`

and `L_t`

are very large (despite being sparse), and it tooks a lot of time to pass that variables on all the remote processors.

My second plan was to obtain these two matrices indipendently on each processor

```
@everywhere begin
... Some code to get L and L_t, so
L = some stuff
L_t = some stuff
end
# The code written above is executed very quickly.
... and than the same code for the differential equation
```

With this method i can see all the processors working on all the servers with `htop`

command, however the solver is so much slower compared to the `EnsembleThreads()`

method. I tried with 25x25 matrices, the Threads method took about 8 seconds, while the Distributed 300!

Am I doing something wrong?