I was messing around with Threads.@thread as I am trying to parallelize a for loop : I want to compute taylorinteg from TaylorIntegration.jl several times, independently, for different initial conditions. To do this, I wrote a MWE reproducing the Kepler example in Jupyter Notebook Viewer
Here is the piece of code :
using TaylorIntegration
const μ = 1.0
const q0 = [0.19999999999999996, 0.0, 0.0, 3.0] # a initial condition for elliptical motion
const order = 28
const t0 = 0.0
const t_max = 10*(2π) # we are just taking a wild guess about the period ;)
const abs_tol = 1.0E-20
const steps = 500000
const r_p3d2 = TaylorSeries.Taylor1{Float64}
#the equations of motion for the Kepler problem:
function kepler!(dq, q, params, t)
r_p3d2 = (q[1]^2+q[2]^2)^(3/2)
dq[1] = q[3]
dq[2] = q[4]
dq[3] = -μ*q[1]/r_p3d2
dq[4] = -μ*q[2]/r_p3d2
nothing
end
function task()
t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps)
return t[end]
end
function f_par(x)
xn = zeros(x)
Threads.@threads for i in 1:x
xn[i] = task()
end
return nothing
end
function f(x)
xn = zeros(x)
for i in 1:x
xn[i] = task()
end
return nothing
end
However, when I try it with @time, the parallelized version (I use 72 CPU hearts on the server) isn’t much faster, mainly because of GC time :
One more problem : the GC time varies greatly, seemingly randomly (3% to 67% for the non-parallelized version for instance). Can you tell if it is a consequence of server activity, my code, or intern to taylorinteg ?
With more threads running concurrently, there allocations / time will increase to the point where the GC basically cannot keep up. I think the library needs to be optimized a bit to reduce the number of allocations.
The GC has various heuristics which relate to the age of objects and total memory allocated etc. So it isn’t too surprising that it varies between runs.
Yet, if you use the macro @taylorize to parse your ODEs, things become better:
using TaylorIntegration
const μ = 1.0
const q0 = [0.19999999999999996, 0.0, 0.0, 3.0] # a initial condition for elliptical motion
const order = 28
const t0 = 0.0
const t_max = 10*(2π) # we are just taking a wild guess about the period ;)
const abs_tol = 1.0E-20
const steps = 500000
@taylorize function kepler!(dq, q, params, t)
r_p3d2 = (q[1]^2+q[2]^2)^(3/2)
dq[1] = q[3]
dq[2] = q[4]
dq[3] = -(μ*q[1])/r_p3d2 # parenthesis needed to help `@taylorize`
dq[4] = -(μ*q[2])/r_p3d2
nothing
end
function task()
t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps)
return t[end]
end
function task_noparse()
t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps, parse_eqs=false)
return t[end]
end
Then I get
@time task() # second run of task()
0.003598 seconds (2.31 k allocations: 19.578 MiB)
@time task_noparse() # second run of task()
0.144778 seconds (198.42 k allocations: 52.445 MiB, 64.56% gc time)
Allocations are improved almost by a factor 2.6, and the time elapsed is reduced by a factor 40. I am using Julia 1.8 and TaylorIntegration v0.9.1.
Quite the late reply, but it got much better this way indeed! Using this in the f and f_par functions, with 8 threads, I now have about a 4-5 better speed with the parallelized version, as expected. GC also takes about 7% of time for both cases. Case solved, thank you!