Memory leak with @distributed

Ralf_Dietrich · February 5, 2023, 7:02am

Hello everyone,

I found this sample code somewhere on the internet. It also works, but leaves a memory leak.

Where is the error here and how would the code work correctly while maintaining the use of @distributed?

using Distributed
addprocs(16) 
N_tot = 1000000000

tstart = time()
N_in = @distributed (+) for n = 1:N_tot
        x = rand() * 2 - 1
        y = rand() * 2 - 1
        r2 = x * x + y * y
        if r2 < 1.0
                1
        else
                0
        end
end
tstop = time()
pi_MC = N_in / N_tot * 4.0
println("time @parallel       = ", tstop - tstart, " seconds")
println("pi estimate = ", pi_MC)

By the way. The leak only happens when I start it in VSCode. From the command line it works fine

One more supplementary question. Why does the code need so much RAM in the first place and is it possible to work around this?

mkitti · February 5, 2023, 7:33am

Your code is getting compiled and cached to memory.

Why do you think there is a memory leak?

Ralf_Dietrich · February 5, 2023, 8:02am

After exiting the program (start from VSC) the memory is not released. I always have to terminate the REPL manually to release the memory.
If I would not do this, my PC would be blocked after a few test runs because no RAM is free anymore. From my point of view this is not an ideal behavior. So what can I do?

jmair · February 5, 2023, 8:09am

Most of the RAM is probably being used for the Julia runtime (JIT compiler and all cached functions etc). As you are using multiprocessing, you have a copy of this for each worker you add which vastly increases the RAM usage. If you are only on a single machine, I would recommend you use multithreading for parallelism instead as it uses far less RAM and will mostly be faster due to lower latency and access to shared memory.

You code with multithreading:

using Base.Threads
function throw_dart()
    x = rand()*2-1
    y = rand()*2-1
    return (x*x+y*y<=1)
end
function est_pi_threaded(n)
    n_in = Atomic{Int}(0)
    block_size = cld(n, nthreads()) # ceiling divide
    @threads for t in 1:nthreads()
        hits = mapreduce(x->throw_dart(), +, 1:block_size)
        atomic_add!(n_in, hits)
    end
    return 4*n_in[]/n
end

You can do this also without the atomic by allocating an array, or by using the ThreadsX package with mapreduce.

EDIT: Make sure Threads.nthreads() is set to something sensible (i.e. equal to number of available cores on the machine).

jmair · February 5, 2023, 8:11am

You should only use addprocs once at the start, and not rerun this part, as you are probably spinning up a lot more processes than you think by rerunning the script multiple times.

Ralf_Dietrich · February 5, 2023, 8:33am

I’m impressed. Great help. Thank you!

mkitti · February 6, 2023, 2:15am

Perhaps there is some confusion. When you run code from VSCode it sends the code to the Julia session running in the REPL. It does not start a new Julia session each time nor does it exit a Julia session each time.

Besides accumulating processes you are also storing variables as globals. These globals are not being sent out of scope by setting them to nothing so they cannot be garbage collected.

Topic		Replies	Views
Memory leak with a distributed code New to Julia	2	551	October 21, 2023
Memory trouble with @distributed New to Julia parallel	4	638	June 8, 2021
Understanding distributed memory garbage collection General Usage	5	2331	August 29, 2019
[Solved] Debugging an apparent memory leak General Usage	1	1678	February 9, 2017
Threads.@threads memory leak General Usage	6	1882	March 28, 2019

Memory leak with @distributed

Related topics