GC time issues when parallelizing tyalorinteg?

Alseidon · May 4, 2022, 1:56pm

Hi all,

I was messing around with Threads.@thread as I am trying to parallelize a for loop : I want to compute taylorinteg from TaylorIntegration.jl several times, independently, for different initial conditions. To do this, I wrote a MWE reproducing the Kepler example in Jupyter Notebook Viewer

Here is the piece of code :

using TaylorIntegration

const μ = 1.0
const q0 = [0.19999999999999996, 0.0, 0.0, 3.0] # a initial condition for elliptical motion
const order = 28
const t0 = 0.0
const t_max = 10*(2π) # we are just taking a wild guess about the period ;)
const abs_tol = 1.0E-20
const steps = 500000

const r_p3d2 = TaylorSeries.Taylor1{Float64}

#the equations of motion for the Kepler problem:
function kepler!(dq, q, params, t)
    r_p3d2 = (q[1]^2+q[2]^2)^(3/2)
    
    dq[1] = q[3]
    dq[2] = q[4]
    dq[3] = -μ*q[1]/r_p3d2
    dq[4] = -μ*q[2]/r_p3d2
    
    nothing
end

function task()
    t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps)
    return t[end]
end

function f_par(x)
    xn = zeros(x)
    Threads.@threads for i in 1:x
        xn[i] = task()
    end
    return nothing
end


function f(x)
    xn = zeros(x)
    for i in 1:x
        xn[i] = task()
    end
    return nothing
end

However, when I try it with @time, the parallelized version (I use 72 CPU hearts on the server) isn’t much faster, mainly because of GC time :

julia> @time f(1000)
 27.500834 seconds (198.44 M allocations: 55.672 GiB, 3.21% gc time)

julia> @time f_par(1000)
 26.797330 seconds (198.44 M allocations: 55.672 GiB, 76.07% gc time)

One more problem : the GC time varies greatly, seemingly randomly (3% to 67% for the non-parallelized version for instance). Can you tell if it is a consequence of server activity, my code, or intern to taylorinteg ?

(This is on Julia 1.6.5)

Thanks !

kristoffer.carlsson · May 4, 2022, 2:04pm

The TaylorIntegration library seem to do a lot of allocations:

julia> @time task()
  0.018372 seconds (198.46 k allocations: 52.447 MiB, 14.05% gc time)
62.83185307179586

With more threads running concurrently, there allocations / time will increase to the point where the GC basically cannot keep up. I think the library needs to be optimized a bit to reduce the number of allocations.

Alseidon · May 4, 2022, 2:19pm

This makes sense, thanks !

Do you have an idea about the instability of the GC time percentage ?

julia> @time f(10)
  0.431646 seconds (1.98 M allocations: 570.078 MiB)

julia> @time f(10)
  0.903906 seconds (1.98 M allocations: 570.078 MiB, 52.85% gc time)

julia> @time f(10)
  0.480541 seconds (1.98 M allocations: 570.078 MiB, 11.01% gc time)

kristoffer.carlsson · May 4, 2022, 2:25pm

The GC has various heuristics which relate to the age of objects and total memory allocated etc. So it isn’t too surprising that it varies between runs.

Alseidon · May 4, 2022, 2:29pm

OK, got it. Thank you very much!

lbenet · October 8, 2022, 10:49pm

Indeed, TaylorIntegration allocates a lot.

Yet, if you use the macro @taylorize to parse your ODEs, things become better:

using TaylorIntegration

const μ = 1.0
const q0 = [0.19999999999999996, 0.0, 0.0, 3.0] # a initial condition for elliptical motion
const order = 28
const t0 = 0.0
const t_max = 10*(2π) # we are just taking a wild guess about the period ;)
const abs_tol = 1.0E-20
const steps = 500000

@taylorize function kepler!(dq, q, params, t)
    r_p3d2 = (q[1]^2+q[2]^2)^(3/2)
    
    dq[1] = q[3]
    dq[2] = q[4]
    dq[3] = -(μ*q[1])/r_p3d2  # parenthesis needed to help `@taylorize`
    dq[4] = -(μ*q[2])/r_p3d2
    
    nothing
end

function task()
    t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps)
    return t[end]
end

function task_noparse()
    t, _ = taylorinteg(kepler!, q0, t0, t_max, order, abs_tol, maxsteps=steps, parse_eqs=false)
    return t[end]
end

Then I get

@time task()  # second run of task()
  0.003598 seconds (2.31 k allocations: 19.578 MiB)

@time task_noparse() # second run of task()
  0.144778 seconds (198.42 k allocations: 52.445 MiB, 64.56% gc time)

Allocations are improved almost by a factor 2.6, and the time elapsed is reduced by a factor 40. I am using Julia 1.8 and TaylorIntegration v0.9.1.

Alseidon · March 5, 2023, 11:46pm

Quite the late reply, but it got much better this way indeed! Using this in the f and f_par functions, with 8 threads, I now have about a 4-5 better speed with the parallelized version, as expected. GC also takes about 7% of time for both cases. Case solved, thank you!

lbenet · March 7, 2023, 5:07pm

Happy it was helpful! Notice that in the last released version, @taylorize has been improved, specially in managing allocations.

Topic		Replies	Views
Multithreading an embarrassingly parallel algorithm increases garbage collection Performance multithreading , memory , memory-allocation , garbage-collection	12	2002	March 1, 2021
@threads vs @parallel, a simple fail case for @threads Performance	3	1425	October 31, 2017
Fluctuations when measuring execution time of linear algebra code Performance	12	1068	July 19, 2019
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	4037	February 11, 2019
Scaling of @threads for "embarrassingly parallel" problem Performance threads	29	2090	January 20, 2023

GC time issues when parallelizing tyalorinteg?

Related topics