Is this the right way of distributing the system of differential equations over multiple nodes on a cluster?

I have a large system of ordinary differential equations, and the right-hand side of the equation is coupled and time-consuming. I have written code that parallelizes the evaluation of the right-hand side over multiple threads and machines, but I am not sure whether this is the right way to do it. Below I have MWE that implements the parallelization. Is this the most efficient way of doing it or is there a better way?

using DifferentialEquations
using Distributed


function eoms!(du, u, p, t)
    fill!(du, 0.0)
    lambda = -1.0
    bins = length(u)
    num_workers = nworkers()
    chunk_size = div(bins, num_workers)
    ranges = [(1 + (i - 1) * chunk_size, min(i * chunk_size, bins)) for i in 1:num_workers]
    futures = [remotecall(parallel_function, w, du[r[1]:r[2]], u[r[1]:r[2]], lambda, r) for (w, r) in zip(workers(), ranges)]
    for (localfuture, r) in zip(futures, ranges)
        local_du = fetch(localfuture)
        du[r[1]:r[2]] .= local_du

@everywhere function parallel_function(du, u, lambda, r)
    for i in 1:(r[2]-r[1]+1)
        du[i] = lambda * u[i]
    return du

tini = 0.0
tfinal = 1.0
deltat = 1.0e-2
tstops = tini:deltat:tfinal
bins = 100
u0 = ones(Float64, bins)

condition1(u, t, integrator) = t ∈ tstops
function affect1!(integrator)
    print(integrator.t, " ")
    for i in 1:bins
        print(integrator.u[i], " ")
cb1 = DiscreteCallback(condition1, affect1!)
cbs = CallbackSet(cb1)
tspan = (tini, tfinal)
p = [] 

prob = ODEProblem(eoms!, u0, tspan, p)
sol = solve(prob, VCABM(), callback=cbs, tstops=tstops, abstol=1.0e-6, reltol=1.0e-6, saveat=tfinal)

That’s fine, you just need to make sure what is distributed is costly enough for this way of doing it to matter.

Thanks a lot. In my case eoms! takes a couple of seconds to calculate on 64 memory-shared threads. I am trying to increase the resolution so it will probably take a few more seconds.