DifferentialEquations.jl uses just one core

Volker_Weissmann · August 5, 2021, 11:33pm

The documentation says that EnsembleThreads() uses multithreading:

EnsembleThreads() - The default. This uses multithreading. It’s local (single computer, shared memory) parallelism only. Fastest when the trajectories are quick.

But if I write this

import DifferentialEquations
f(du, u, p, t) = du .= 1.
g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
sol = DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=2^8)

and execute it using julia bug2.jl --threads 4, gnome-system-monitor shows that it’s only using one core.
Any Ideas how to fix this?

dilumaluthge · August 6, 2021, 1:01am

What if you try

julia --threads 4 bug2.jl

johnh · August 6, 2021, 2:44am

Stupid suggestion. To monitor the program I would open a terminal window and use ‘top -H’
Or even better the htop utility

https://htop.dev/

OmarElrefaei · August 6, 2021, 3:38am

Not op
oh, I do remember getting stung by something like this before and wanting to report it as a bug. You think that would be worth it, or is it by intentional design.~~ I’d think requiring a -- after file name a reasonable requirement.~~ Damn, I don’t think that would be backwards compatible. I does need to be better documented tho

Volker_Weissmann · August 8, 2021, 6:13am

Still the same Problem:

import DifferentialEquations
f(du, u, p, t) = du .= 1.
g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
@time sol = DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=2^8)

$ time julia --threads 4 bug2.jl
 24.129812 seconds (78.55 M allocations: 20.149 GiB, 6.46% gc time)

real    0m33,589s
user    0m48,884s
sys     0m2,072s

Volker_Weissmann · August 8, 2021, 7:28am

I know that top and htop exist, but that does not change the problem.

Volker_Weissmann · August 8, 2021, 7:32am

Update: The following starts using multiple cores, then drops down to one core when starting main simulation is printed.

using Distributed
addprocs(4)
import DifferentialEquations
f(du, u, p, t) = du .= 1.
g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
println("starting main simulation")
flush(stdout)
DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=2^8)

Sukera · August 8, 2021, 8:02am

Let’s perhaps go in the other direction - do you observe a loss in performance when you explicitly specify EnsembleSerial()? According to the DifferentialEquations.jl docs, EnsembleThreads() is the default anyway, so I’d expect it to be slower when explicitly running in serial (perhaps with a more complicated/slower to calculate differential equation, instead of just setting some memory to a constant value).

See Parallel Ensemble Simulations · DifferentialEquations.jl

I wouldn’t be concerned about it using only one core a priori, unless you have some task/OS-thread migration enabled to even the load between physical cores (that’s the default behavior on windows, I don’t know about your linux machine).

Volker_Weissmann · August 8, 2021, 8:07am

EnsembleSerial and EnsembleThreads are equally fast.

I would be concerned if it’s using only one core, because I’m missing out on quite a lot of performance if I run it on a 128 core machine.

Sukera · August 8, 2021, 8:09am

Interesting. What if you use EnsembleDistributed() with addprocs(4) and julia script.jl (i.e. no explicit additional threads)? +A somewhat slower diffeq, since the overhead of interprocess communication is larger than for threads.

Sukera · August 8, 2021, 8:21am

Also, what specs does your machine have? Can you post versioninfo()?

Volker_Weissmann · August 8, 2021, 9:05am

EnsembleDistributed does use multiple cores, but is slower than EnsembleThreads or EnsembleSerial. Both on my 4 core machine and on a 128 core machine with addprocs(128).

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4* (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC 7542 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Sukera · August 8, 2021, 9:13am

Slower for the minimal “diffeq” that basically only writes a constant to memory? Yes, I don’t think that’s too surprising, since you basically only measure the overhead of serial vs. threaded vs. distributed communication. You really only get an advantage when your code is a bit more expensive than that overhead itself.

Volker_Weissmann · August 8, 2021, 9:15am

It doesn’t just writes a constant to memory. It calculates

x = x + f *dt + g * dt * rand()

And communication is only needed whenever a trajectory is finished.

Sukera · August 8, 2021, 9:24am

What code are you running, exactly? What you’ve shown so far

really isn’t a very expensive function to calculate at all (even taking the pertubation by nature of an SDE into account). This is what I meant with “just writes to memory”, since presumably all other calculation is equivalent for all SDEs.

So I still think linux not migrating threads between cores is the cause of you not seeing activity on more than one core. I’m guessing by default julia doesn’t let itself be migrated between cores, since it’s an opt-in API, as far as I know. Maybe @ChrisRackauckas can shed some light on what’s happening with DiffEq and threading here though.

Volker_Weissmann · August 8, 2021, 10:26am

I ran this code twice, once with N=1, once with N=64, on a machine with 128 logical cores and 64 physical cores.
Both are approximately equally fast, and after printing starting main simulation, both of them used only one core as confirmed by htop.
This does not make any sense to me.

using Distributed

addprocs(64)

import DifferentialEquations

@everywhere f(du, u, p, t) = du .= 1.

@everywhere g(du, u, p, t) = du .= 1e-3

prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))

ensembleprob = DifferentialEquations.EnsembleProblem(prob)

println("starting main simulation")

flush(stdout)

@time DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=N)

ChrisRackauckas · August 8, 2021, 10:41am

Did you try setting versioninfo()? Did you check Threads.nthreads()?

julia> versioninfo()
Julia Version 1.7.0-beta3.0
Commit e76c9dad42 (2021-07-07 08:12 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, znver3)
Environment:
  JULIA_EDITOR = "C:\Users\accou\AppData\Local\atom\app-1.58.0\atom.exe"  -a
  JULIA_NUM_THREADS = 32
  JULIA_PKG_SERVER = https://neuralsim.juliahub.com

If you set your threads then versioninfo() would tell you, and yours is blank.

Volker_Weissmann · August 8, 2021, 11:17am

Threads.nthreads() is 1 unless I specify --threads 64.
Even with --threads 64 it still runs on 1 core. Without --threads 64 the other stuff runs on multiple cores.

My versioninfo() does not change if I run it with --threads 64.

Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC 7542 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

I didn’t installed julia, I just downloaded the binaries. So you should be able to reproduce it.

curl https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.2-linux-x86_64.tar.gz -o julia.tar.gz
tar -zxvf  julia.tar.gz
julia-1.6.2/bin/julia bug.jl

ChrisRackauckas · August 8, 2021, 10:18pm

You’re solving a problem so tiny that it’s faster to just solve it then to pay the thread startup cost. But:

import DifferentialEquations
f(du, u, p, t) = du .= 1.
g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
@time DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=100000)

That shows all cores are used at max capacity just fine (if your threads are setup).

Topic		Replies	Views
EnsembleThreads slower than EnsembleSerial or EnsembleDistributed Modelling & Simulations package , diffeq	3	822	April 29, 2021
Parallel computing using Parallel Ensemble Simulations (DiffEq) does not seem to work Modelling & Simulations	13	1327	August 30, 2019
Using EnsembleThreads in DifferentialEquations.jl fails for SDEs with trajectories > nthreads Modelling & Simulations diffeq , sde	3	754	March 24, 2021
DifferentialEquations Multithreading ; How to use EnsembleProblem New to Julia differentialequation	1	509	May 13, 2022
DifferentialEquations.jl EnsembleThreads: crash w/increase in problem size General Usage	3	587	November 19, 2021

DifferentialEquations.jl uses just one core

Related topics