Garbage collection not aggressive enough on Slurm Cluster

That’s strange, I am also using Julia on a Slurm cluster and I have never encountered such an issue. On the contrary, when I lower the amount of memory per cpu, I typically get close to full memory efficiency while gc time increases.
Unfortunately, I cannot directly help with this problem as I am no expert, but maybe the difference in our use cases helps to identify the issue?
My particular use involves using Threads over all cores of a node and distributing equivalent jobs over workers via pmap. As a result, I do not have that many distinct workers. Do you know if your problem somehow depends on the number of workers, i.e is there still a memory leak if you only use one worker?
In any case, I hope you find a solution ,
Salmon

Edit: To clarify, I am only using one instance of pmap() in a program, so if the issue is freeing memory after a worker is completely finished, this might be why it works in my case.

1 Like