EnsembleThreads() - The default. This uses multithreading. It’s local (single computer, shared memory) parallelism only. Fastest when the trajectories are quick.
But if I write this
import DifferentialEquations
f(du, u, p, t) = du .= 1.
g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
sol = DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=2^8)
and execute it using julia bug2.jl --threads 4, gnome-system-monitor shows that it’s only using one core.
Any Ideas how to fix this?
Not op
oh, I do remember getting stung by something like this before and wanting to report it as a bug. You think that would be worth it, or is it by intentional design.~~ I’d think requiring a -- after file name a reasonable requirement.~~ Damn, I don’t think that would be backwards compatible. I does need to be better documented tho
Let’s perhaps go in the other direction - do you observe a loss in performance when you explicitly specify EnsembleSerial()? According to the DifferentialEquations.jl docs, EnsembleThreads() is the default anyway, so I’d expect it to be slower when explicitly running in serial (perhaps with a more complicated/slower to calculate differential equation, instead of just setting some memory to a constant value).
I wouldn’t be concerned about it using only one core a priori, unless you have some task/OS-thread migration enabled to even the load between physical cores (that’s the default behavior on windows, I don’t know about your linux machine).
Interesting. What if you use EnsembleDistributed() with addprocs(4) and julia script.jl (i.e. no explicit additional threads)? +A somewhat slower diffeq, since the overhead of interprocess communication is larger than for threads.
EnsembleDistributed does use multiple cores, but is slower than EnsembleThreads or EnsembleSerial. Both on my 4 core machine and on a 128 core machine with addprocs(128).
julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4* (2021-07-14 15:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, haswell)
julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7542 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Slower for the minimal “diffeq” that basically only writes a constant to memory? Yes, I don’t think that’s too surprising, since you basically only measure the overhead of serial vs. threaded vs. distributed communication. You really only get an advantage when your code is a bit more expensive than that overhead itself.
What code are you running, exactly? What you’ve shown so far
really isn’t a very expensive function to calculate at all (even taking the pertubation by nature of an SDE into account). This is what I meant with “just writes to memory”, since presumably all other calculation is equivalent for all SDEs.
So I still think linux not migrating threads between cores is the cause of you not seeing activity on more than one core. I’m guessing by default julia doesn’t let itself be migrated between cores, since it’s an opt-in API, as far as I know. Maybe @ChrisRackauckas can shed some light on what’s happening with DiffEq and threading here though.
I ran this code twice, once with N=1, once with N=64, on a machine with 128 logical cores and 64 physical cores.
Both are approximately equally fast, and after printing starting main simulation, both of them used only one core as confirmed by htop.
This does not make any sense to me.
using Distributed
addprocs(64)
import DifferentialEquations
@everywhere f(du, u, p, t) = du .= 1.
@everywhere g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
println("starting main simulation")
flush(stdout)
@time DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=N)
Threads.nthreads() is 1 unless I specify --threads 64.
Even with --threads 64 it still runs on 1 core. Without --threads 64 the other stuff runs on multiple cores.
My versioninfo() does not change if I run it with --threads 64.
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7542 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
I didn’t installed julia, I just downloaded the binaries. So you should be able to reproduce it.
curl https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.2-linux-x86_64.tar.gz -o julia.tar.gz
tar -zxvf julia.tar.gz
julia-1.6.2/bin/julia bug.jl