Memory leak with @distributed

Hello everyone,

I found this sample code somewhere on the internet. It also works, but leaves a memory leak.

Where is the error here and how would the code work correctly while maintaining the use of @distributed?

using Distributed
addprocs(16) 
N_tot = 1000000000

tstart = time()
N_in = @distributed (+) for n = 1:N_tot
        x = rand() * 2 - 1
        y = rand() * 2 - 1
        r2 = x * x + y * y
        if r2 < 1.0
                1
        else
                0
        end
end
tstop = time()
pi_MC = N_in / N_tot * 4.0
println("time @parallel       = ", tstop - tstart, " seconds")
println("pi estimate = ", pi_MC)

By the way. The leak only happens when I start it in VSCode. From the command line it works fine

One more supplementary question. Why does the code need so much RAM in the first place and is it possible to work around this?

Your code is getting compiled and cached to memory.

Why do you think there is a memory leak?

After exiting the program (start from VSC) the memory is not released. I always have to terminate the REPL manually to release the memory.
If I would not do this, my PC would be blocked after a few test runs because no RAM is free anymore. From my point of view this is not an ideal behavior. So what can I do?

Most of the RAM is probably being used for the Julia runtime (JIT compiler and all cached functions etc). As you are using multiprocessing, you have a copy of this for each worker you add which vastly increases the RAM usage. If you are only on a single machine, I would recommend you use multithreading for parallelism instead as it uses far less RAM and will mostly be faster due to lower latency and access to shared memory.

You code with multithreading:

using Base.Threads
function throw_dart()
    x = rand()*2-1
    y = rand()*2-1
    return (x*x+y*y<=1)
end
function est_pi_threaded(n)
    n_in = Atomic{Int}(0)
    block_size = cld(n, nthreads()) # ceiling divide
    @threads for t in 1:nthreads()
        hits = mapreduce(x->throw_dart(), +, 1:block_size)
        atomic_add!(n_in, hits)
    end
    return 4*n_in[]/n
end

You can do this also without the atomic by allocating an array, or by using the ThreadsX package with mapreduce.

EDIT: Make sure Threads.nthreads() is set to something sensible (i.e. equal to number of available cores on the machine).

2 Likes

You should only use addprocs once at the start, and not rerun this part, as you are probably spinning up a lot more processes than you think by rerunning the script multiple times.

1 Like

Iā€™m impressed. Great help. Thank you!

1 Like

Perhaps there is some confusion. When you run code from VSCode it sends the code to the Julia session running in the REPL. It does not start a new Julia session each time nor does it exit a Julia session each time.

Besides accumulating processes you are also storing variables as globals. These globals are not being sent out of scope by setting them to nothing so they cannot be garbage collected.