I have a script for calculation of millions of strings distances with StringDistances.jl. I’m using 64 CPUs virtual machine with 240GB RAM and found that my script was killed by out of memory. I cleared the script to minimize object allocations but Julia still takes up to 240GB. Memory consumption in time is far from linear but rather a saw. Looks like garbage collector is switched on after few minutes but next is completely switched off. Also, there is dependency on how many threads I’m running. When the threads number is much less than number of available CPUs it works. E.g. 30 from 64. But when I’m running the script with 60 threads on 64 CPUs VM, I’m getting out of memory after some time. The only way I found how to finish the calculation properly is to add explicit memory control:
@threads for item in list
# do something useful with `item`
# In my case this part is calculated few minutes
# ...
if (Sys.free_memory() / Sys.total_memory() < 0.1)
GC.gc()
sleep(10)
end
end
And running with JULIA_NUM_THREADS=60 julia --project=@. src/...
After ~10 hours of calculation with 60 threads I got results. And I can say that my real script’s memory consumption is less that 5-10 GB but not 240 GB of available RAM.
So, the questions are how to avoid that explicit code in my script and does Julia do automatic cleaning of memory instead of collecting garbage and be killed by operational system by out of memory?
while julia for sure should handle this, I just want to ask some simple question since no MWE is provided, can you pre-allocate? if all 60 threads are all comparing the biggest stings, what’s the estimated memory consumption?
Essentially, because the GC pass requires all threads to hit a safepoint (and then pause), it’s possible that one or more “runaway” threads keep the GC pass from occurring, and so those threads can keep accumulating garbage that isn’t GC’d before hitting the memory limit. The PR linked in that issue (https://github.com/JuliaLang/julia/pull/33092) provides a means to explicitly do a cheap safepoint check in the middle of compute-bound code. Maybe it’s worth giving that PR a try?
@jling, yes, I’m pre-allocating and even pre-processing strings before doing distance calculations. Actually the task is elementary. Simply speaking I have two lists of strings with ~10-30 characters. In the mentioned case with 10h x 60 threads it was 20k + 70k strings. And I’m using RatcliffObershelp() metrics for distance calculation. Even sorting of tokens I’m doing once for these stings. 60 threads are going through 20k list.
@jpsamaroo, thanks. It looks like I used old style but almost proper way how to fix it temporarily The task where I found the issue is a single run task. So, definitely I can check it with mentioned by you safepoint. At the same time I hope it will be fixed soon.
I had the same type of problem with a parallel computation of mine. The default garbage collection of Julia was not being safe enough when many threads were being launched.
This workaround solved the problem completely, but why did you add the sleep(10) directive?
The only reason why I added sleep() is to release a current processor time slice scheduled by an operational system. 10 is less than a typical time slice. In that case, I’m expecting that a thread with a garbage collector will be able to do something useful.
I think this thread can be helpful on an issue I’m working on with segfaults in julia. However I did not quite understand the purpose of sleep(10). Could you elaborate on it a little more?
When a thread is doing sleep, it means that it is not participating in the tasks scheduling. 10 milliseconds is less than minimal time slice in most operational systems but it is enough to switch a processor core to another thread. The garbage collection issue is happening when a system is high loaded. So, excluding the current computational thread from the cycle of task scheduling increases probability of finishing utility work by auxiliary threads. Including the background garbage collection.