Performance hit when using callbacks

krcools · July 28, 2017, 9:40am

I am trying to assemble a matrix using multiple processes. To make sure each process writes to the part of the matrix it owns, I am using a callback as one of the arguments to the single process jobs launched by the driver.

function assemble()
    M = N = 1000
    A = zeros(M,N)
    store = (v,m,n) -> (A[m,n] += v)

    P = procs()
    length(P) > 1 && (P = P[2:end])
    splits = [round(Int,s) for s in linspace(0, M, length(P)+1)]

    @sync begin
        for (i,p) in enumerate(P)
            start, stop = splits[i]+1, splits[i+1]
            storei = (v,m,n) -> store(v,start+m-1,n)
            Mi = stop - start + 1
            @async remotecall_wait(assemblechunk, p, Mi, N, storei)
    end end
    A
end

function assemblechunk(M,N,store)
    for m in 1:M
        for n in 1:N
            v = randn()
            store(v,m,n)
end end end

assemble(); @time assemble();

I am getting a lot of allocations and a serious slowdown of execution by doing this. The output of @time is:

0.167571 seconds (5.45 M allocations: 90.761 MiB, 8.63% gc time)

When I replace storei by store (which I know gives not the desired result), the penalty disappears:

0.011816 seconds (115 allocations: 7.662 MiB)

Any idea where this originates from? Or even how to start analysing this?

krcools · July 28, 2017, 9:56am

Ah…

https://github.com/JuliaLang/julia/issues/15276

Changing the corresponding line to the following fixed it:

start::Int, stop::Int = splits[i]+1, splits[i+1]

Topic		Replies	Views
Scaling of @threads for "embarrassingly parallel" problem Performance threads	29	1953	January 20, 2023
Garbage collection and threading Performance memory-allocation	17	1934	December 20, 2023
Significant allocations with Callbacks (Tsit5) Modelling & Simulations diffeq	4	585	August 16, 2021
Why fewer memory allocations does not necessarily suggest higher speed New to Julia performance , memory-allocation	5	791	June 6, 2021
For loop in function and multiplication of larger matrices, slow speed in parallel Performance performance , parallel , loops	3	1301	November 22, 2019

Performance hit when using callbacks

Related topics