oh, I do remember getting stung by something like this before and wanting to report it as a bug. You think that would be worth it, or is it by intentional design.~~ I’d think requiring a -- after file name a reasonable requirement.~~ Damn, I don’t think that would be backwards compatible. I does need to be better documented tho
Let’s perhaps go in the other direction - do you observe a loss in performance when you explicitly specify EnsembleSerial()? According to the DifferentialEquations.jl docs, EnsembleThreads() is the default anyway, so I’d expect it to be slower when explicitly running in serial (perhaps with a more complicated/slower to calculate differential equation, instead of just setting some memory to a constant value).
I wouldn’t be concerned about it using only one core a priori, unless you have some task/OS-thread migration enabled to even the load between physical cores (that’s the default behavior on windows, I don’t know about your linux machine).
Interesting. What if you use EnsembleDistributed() with addprocs(4) and julia script.jl (i.e. no explicit additional threads)? +A somewhat slower diffeq, since the overhead of interprocess communication is larger than for threads.
Slower for the minimal “diffeq” that basically only writes a constant to memory? Yes, I don’t think that’s too surprising, since you basically only measure the overhead of serial vs. threaded vs. distributed communication. You really only get an advantage when your code is a bit more expensive than that overhead itself.
What code are you running, exactly? What you’ve shown so far
really isn’t a very expensive function to calculate at all (even taking the pertubation by nature of an SDE into account). This is what I meant with “just writes to memory”, since presumably all other calculation is equivalent for all SDEs.
So I still think linux not migrating threads between cores is the cause of you not seeing activity on more than one core. I’m guessing by default julia doesn’t let itself be migrated between cores, since it’s an opt-in API, as far as I know. Maybe @ChrisRackauckas can shed some light on what’s happening with DiffEq and threading here though.
I ran this code twice, once with N=1, once with N=64, on a machine with 128 logical cores and 64 physical cores.
Both are approximately equally fast, and after printing starting main simulation, both of them used only one core as confirmed by htop.
This does not make any sense to me.
@everywhere f(du, u, p, t) = du .= 1.
@everywhere g(du, u, p, t) = du .= 1e-3
prob = DifferentialEquations.SDEProblem(f, g, [0.], (0., 100.))
ensembleprob = DifferentialEquations.EnsembleProblem(prob)
println("starting main simulation")
@time DifferentialEquations.solve(ensembleprob, DifferentialEquations.EnsembleThreads(), trajectories=N)