How to GC a SharedArray

What’s the right way to garbage collect a SharedArray? This doesn’t work. Julia consumes 8 GB memory indefinitely (Version 1.6.0-DEV.1237)

using SharedArrays
begin
    a = SharedArray{Float64}(10000000)
    @distributed for i = 1:10
        a[i] = i
    end
end
GC.gc()

Moving to a let made this work. This MWE doesn’t seem to be representative of the issue I’m seeing

Maybe try a = nothing; GC.gc(). If you’ve bound the array to a named variable on the workers, I suspect you’ll have to assign that name something else there too.

1 Like

Thanks for the tip. We’d nothing-ed it locally but not on the workers.

However, the more we look into this, the more it looks like a memory leak that can be replicated without any Distributed involvement, and we’re yet to find a good MWE (emphasis on the M)

I am debugging this problem with Ian so I am going to jump in here. I ran the program under strace to see what is happening (kernel 5.8.15-201.fc32.x86_64). Here are my findings.

When the SharedArray is created, it gets a shared memory object, mmaps it, and unlinks it.

openat(AT_FDCWD, "/dev/shm/jl326782ohMyKWxhO9Rm4KiyE8Qr", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0600) = 29
mmap(NULL, 1365245952, PROT_READ|PROT_WRITE, MAP_SHARED, 29, 0) = 0x7f0c6ac6f000
munmap(0x7f0c6ac6f000, 1365245952)      = 0
close(29)                               = 0

openat(AT_FDCWD, "/dev/shm/jl326782ohMyKWxhO9Rm4KiyE8Qr", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 32
fcntl(32, F_GETFL)                      = 0x28002 (flags O_RDWR|O_LARGEFILE|O_NOFOLLOW)
mmap(NULL, 1365245952, PROT_READ|PROT_WRITE, MAP_SHARED, 32, 0) = 0x7f0c1926f000
unlink("/dev/shm/jl326782ohMyKWxhO9Rm4KiyE8Qr") = 0

The region starting at 0x7f0c1926f000 is not munmaped until the program exits. It is not munmaped when the GC runs. The last line in the program (write(22, "DONEDONEDONEDONEDONEDONEDONEDONE", 32DONEDONEDONEDONEDONEDONEDONEDONE) = 32) runs before this munmap call

munmap(0x7f0c1926f000, 1365245952)      = 0

I am confident that when the GC runs the SharedArray is finalized because the ref put in SharedArrays.sa_refs is deleted, which only happens in the finalizer finalize_refs. Finalizing the SharedArray should also finalize its member s, though. Mmap.mmap registers a finalizer for s that calls munmap, so I don’t know why this is not called.

Hi,
I think I am facing a similar issue. Here is another MWE:

using Distributed, SharedArrays

function parallel_compute()
    shared_array = SharedArray{Float64,1}(100_000)
    @sync @distributed for i in 1:100_000
        shared_array[i] = randn()
    end
    return sum(shared_array)
end

addprocs(4)

for t = 1:10
    x = parallel_compute()
    println(Sys.free_memory()/2^20)
end

rmprocs(workers())

when I run this code, the amount of free RAM available (printed in MiB in the main loop) keeps decreasing:

493.28515625
492.31640625
491.46875
478.390625
477.6640625
476.81640625
463.73828125
463.01171875
462.52734375
461.55859375

If I push it further (with more iterations and a more complex parallel_compute function), my code breaks because of memory shortage. I would also expect that the GC would help getting some free space after parallel_compute is called but it seems that it doesn’t. I tried sharred_array = nothing; GC.gc() at the end of parallel_compute but it didn’t help. Is this the expected behavior ? Am I missing something ?

I solved my problem with

@everywhere shared_array = nothing
@everywhere GC.gc()

I was missing that it should be called on all workers, sorry for the duplicate question.