Memory trouble with @distributed

I’m trying to parallelize a piece of code using Distributed, but it’s slowly eating up all my memory. Here’s an example:

using Distributed
using SharedArrays
if nprocs() == 1
    addprocs(4)
end

function main()
    for i=1:100
        consumes_mem()
    end
end

function consumes_mem()
    shared_var = SharedArray{Float64,2}(1000,100000,init=0.0); 
    @sync @distributed for i=1:1000 # Memory use seems to grow every time this loop is called
        random_num = randn(100000) 
        @. shared_var[:,i] = random_num
    end
    result = sum(shared_var)
    return result
end 

main()

When I run this code, my memory usage keeps growing. It seems that the memory usage increases during every call to shared_var. Adding @everywhere GC.gc() in the loop seems to resolve the problem, but it’s causing a massive slowdown. What am I missing here?

I know this example can be easily rewritten with pmap or threads, but I just don’t understand why this code is behaving so awkwardly.

1 Like

This seems to be feature of SharedArrays, see this more minimal example:

using SharedArrays

function consumes_mem()
    shared_var = SharedArray{Float64,2}(1000,100000,init=0.0)
    for i=1:1000 # Memory use seems to grow every time this loop is called
        random_num = randn(100000) 
        @. shared_var[i,:] = random_num
    end
    result = sum(shared_var)
    return result
end 

for i=1:100
	println(string(consumes_mem()))
end

which reproduces the memory consumption. (by the way, your code has an error, it’s: @. shared_var[i,:] = random_num see the row/col indices)

Why I say feature? Because:

help?> SharedArray
...
The shared array is valid as long as a reference to the SharedArray object exists on the node which created the mapping.
...

Of course, in above case it shouldn’t outlive the function call.
There is an existing issue: Memory Usage and SharedArrays · Issue #33990 · JuliaLang/julia · GitHub Wrong issue, but there Issues · JuliaLang/julia · GitHub are several related issues. Perhaps another specific one wouldn’t harm?

But, for a work around, if it is somehow a real problem to solve, our original code should better look like:

using Distributed
using SharedArrays
if nprocs() == 1
    addprocs(4)
end

function main()
    shared_var = SharedArray{Float64,2}(1000,100000,init=0.0); 
    for i=1:100
        r=consumes_mem(shared_var)
		println(string(i)*": "*string(r))
    end
end

@everywhere function consumes_mem(shared_var)
    @sync @distributed for i=1:1000 # Memory use seems to grow every time this loop is called
        random_num = randn(100000) 
        @. shared_var[i,:] = random_num
    end
    result = sum(shared_var)
    return result
end 

main()

Thanks! It indeed was the allocation of the SharedArray in each loop that filled up my memory. I suspected that the memory filled up at each innter loop iteration, but that’s not the case. This example doesn’t seem to cause a memory overflow:

using Distributed
using SharedArrays
if nprocs() == 1
    addprocs(4)
end

function main()
    consumes_mem()
end

function consumes_mem()
    println(nprocs())
    shared_var = SharedArray{Float64,2}(10000,10000,init=0.0); 
    @sync @distributed for i=1:1000000
        idx = rand([1:10000...])
        random_num = randn(10000) 
        @. shared_var[:,idx] = random_num
    end
    result = sum(shared_var)
    return result
end 

main()

Do you know of a more elegant way of de-allocating the SharedArray that just setting it to an empty object and manually call @everywhere GC.cg()?

My example above shows a better way, but perhaps it doesn’t work for your real world problem?

In your example, the shared array can be allocated just once on the master and passed as an argument to the function consumes_mem.

With that there is no need to deallocate it with gc().

Yes, agree! The need came from reading netCDF files in parallel, with each slice of the SharedArray coming from a different file. But this solved the issue.