Hi there!
I was doing some processing using Julia 1.1.0 and ClusterManagers v0.3.2
I have two nodes and I tried launching a function in the second node using the worker 3 with
fetch(@spawnat 3 function())
This is because the function is memory intensive and it doesn’t fit my initial node.
The function runs fine but after trying to run exactly the same function with similar arguments, the worker dies with
OutOfMemoryError
My question is: wasn’t the memory used by the first call freed after the fetch
?
I tried the following to see what is happening:
julia> using Distributed,ClusterManagers
julia> addprocs(SlurmManager(2),m="cyclic")
2-element Array{Int64,1}:t of 2
2
3
julia> fetch(@spawnat 3 Sys.total_memory() / 2^20)
128808.3671875
julia> fetch(@spawnat 3 Sys.free_memory() / 2^20)
125345.609375
julia>@everywhere function bigarray()
rand(500,500,100,180)
return "hello"
end
julia>fetch(@spawnat 3 bigarray())
"hello"
julia> fetch(@spawnat 3 Sys.free_memory() / 2^20)
91708.76953125
So apparently the memory in the worker is not automatically freed, and it makes sense for my function to not be able to run twice, as the memory usage fits once in that machine but not twice.
Is this behavior expected? Is the need to call @spawnat 3 GC.gc()
the ideal solution?