MPI.jl memory issue in a for-loop

Hi,

At first I want to say thank you for maintaining MPI.jl package.

I am using MPI inside a for-loop (for iteration=1:5000). At each iteration, all ranks will send its data to rank0 using MPI.Gatherv!, then rank0 will send some data to all ranks using MPI.Scatterv!.

My code will fail after some iterations, sometimes due to out-of-memory in rank0, sometimes in other ranks. I tested the code multiple times, and the code failed at different iteration.

I am confusing because the data size in all calculations are the same in every single iteration, then why I have out-of-memory issue?

Is there some garbage clean issue with MPI? Should I use MPI.Barrier(comm) at the end of each iteration to wait until all ranks finished garbage clean? Could you please give me some suggestions on garbage clean in MPI?

Below is an example of my code, incluidng all used MPI functions:

MPI.Init()

if my_rank == 0
    Z_all_vbuf  = VBuffer(Z_all, counts) 
else
    Z_all_vbuf  = VBuffer(nothing)
end

for iteration in 1:5000
    my_Z = f1(my_Z, my_res)
    MPI.Gatherv!(my_Z, Z_all_vbuf, 0, comm)
    if my_rank == 0
        res_all = f2(Z_all)
        res_all_vbuf = VBuffer(res_all, size_all)
    else
        res_all_vbuf = VBuffer(nothing)
    end
    my_res = MPI.Scatterv!(res_all_vbuf, my_size, 0, comm)
end

MPI.Finalize()

Thank you so much,
Carol

I’m not able to run the code sample you provided, but my general suggestion is to allocate all the buffers beforehand if they’re not changing (I.e move the VBuffer calls outside the loop, and make f1 operate in-place)

This is a good performance tip, but it shouldn’t run out of memory if you re-allocate the buffers over and over (as long as you don’t retain a reference to the old buffers).

Maybe there is a memory leak in the functions f1 and f2, which @Carol does not provide? In general, if you want help, the general advice is to provide a minimal working example, so that other people can run your code.

I have had a similar problem, see cache the created Datatypes by s-fuerst · Pull Request #675 · JuliaParallel/MPI.jl · GitHub . Is your MPI.jl version >= 0.20.4? If not, an update should hopefully fix the problem.