I’m trying to parallelize a piece of code using Distributed, but it’s slowly eating up all my memory. Here’s an example:
using Distributed
using SharedArrays
if nprocs() == 1
addprocs(4)
end
function main()
for i=1:100
consumes_mem()
end
end
function consumes_mem()
shared_var = SharedArray{Float64,2}(1000,100000,init=0.0);
@sync @distributed for i=1:1000 # Memory use seems to grow every time this loop is called
random_num = randn(100000)
@. shared_var[:,i] = random_num
end
result = sum(shared_var)
return result
end
main()
When I run this code, my memory usage keeps growing. It seems that the memory usage increases during every call to shared_var. Adding @everywhere GC.gc() in the loop seems to resolve the problem, but it’s causing a massive slowdown. What am I missing here?
I know this example can be easily rewritten with pmap or threads, but I just don’t understand why this code is behaving so awkwardly.
This seems to be feature of SharedArrays, see this more minimal example:
using SharedArrays
function consumes_mem()
shared_var = SharedArray{Float64,2}(1000,100000,init=0.0)
for i=1:1000 # Memory use seems to grow every time this loop is called
random_num = randn(100000)
@. shared_var[i,:] = random_num
end
result = sum(shared_var)
return result
end
for i=1:100
println(string(consumes_mem()))
end
which reproduces the memory consumption. (by the way, your code has an error, it’s: @. shared_var[i,:] = random_num see the row/col indices)
Why I say feature? Because:
help?> SharedArray
...
The shared array is valid as long as a reference to the SharedArray object exists on the node which created the mapping.
...
But, for a work around, if it is somehow a real problem to solve, our original code should better look like:
using Distributed
using SharedArrays
if nprocs() == 1
addprocs(4)
end
function main()
shared_var = SharedArray{Float64,2}(1000,100000,init=0.0);
for i=1:100
r=consumes_mem(shared_var)
println(string(i)*": "*string(r))
end
end
@everywhere function consumes_mem(shared_var)
@sync @distributed for i=1:1000 # Memory use seems to grow every time this loop is called
random_num = randn(100000)
@. shared_var[i,:] = random_num
end
result = sum(shared_var)
return result
end
main()
Thanks! It indeed was the allocation of the SharedArray in each loop that filled up my memory. I suspected that the memory filled up at each innter loop iteration, but that’s not the case. This example doesn’t seem to cause a memory overflow:
using Distributed
using SharedArrays
if nprocs() == 1
addprocs(4)
end
function main()
consumes_mem()
end
function consumes_mem()
println(nprocs())
shared_var = SharedArray{Float64,2}(10000,10000,init=0.0);
@sync @distributed for i=1:1000000
idx = rand([1:10000...])
random_num = randn(10000)
@. shared_var[:,idx] = random_num
end
result = sum(shared_var)
return result
end
main()
Do you know of a more elegant way of de-allocating the SharedArray that just setting it to an empty object and manually call @everywhere GC.cg()?
Yes, agree! The need came from reading netCDF files in parallel, with each slice of the SharedArray coming from a different file. But this solved the issue.