Julia killed with Out of memory error on Linux -- runs fine on MacOS

I have a piece of code that runs smoothly on MacOS but keeps crashing on my Linux (ubuntu 22) workstation. The code is quite involved and it is hard to write a meaningful minimal working example for it – it goes through a set of large HDF5 files containing timeseries and computes convolutions with a separate set of signals (this part is computationally expensive), but does not store results or return large arrays (it simply returns a small set of array indices, and pauses to save some text files at some stages).

On a MacOS system (version 12.6), there is no issue and running it on 2 CPUs (super basic parallelisation: each CPU deals with a different file) I can see that each process never requires more than about 1.5Gb memory. The memory requirements do no grow over time and the code executes nicely until all files have been visited.

The exact same code, with same environment and datafiles, gets systematically killed with oom (checked with sudo dmesg) when I run it on a Linux system (Ubuntu 22.04.2 LTS). I have 32 Gb of RAM in both systems, but I can see the memory used by julia processes grow over time when it runs on linux. I have tried going from multiple CPUs to single CPU (removing all use of Distributed), I have tried calling GC.gc() at different stages within the loops, and starting julia with --heap-size-hint=2G. None of this made any substantial difference, and julia gets killed at some early point in each run.

I have seen this issue https://github.com/JuliaLang/julia/issues/42566 and this one https://github.com/JuliaLang/julia/issues/50658 which sound related, but the discussions there go well above my head.

Note: I remember running that same piece of code a few years back on an older ubuntu version (LTS 18 or so, cannot remember details) and it was doing just perfect. My understanding currently (for what it is worth) is that the issue is rooted in how the OS interacts with Julia (?).

Now the question is: have other users experienced similar issues? And is there any way to (at least temporarily) work around this?

The second issue you linked seems the most pertinent.

The situation you are encountering sounds like a β€œmemory leak”

One complication may the interaction with HDF5. Are you properly closing your datasets and files so that HDF5 can release memory?

We really could use some more information about your environment.

What is the output of versioninfo() and Pkg.status() on both macOS and Linux?

I’ve run into similar situations in multithreaded code. Adding regular forces GC fixed the OOM but potentially slowed down the code significantly. This suggests that during multithreaded loops the GC is not working as I expected. This is consistent with getting OOM on a machine with many processors/threads But not on a 2-core system.

Maybe unrelated but I thought worth mentioning?

Thanks for your quick answers. Of course I should have checked Julia’s versions. On the macOS, I was using 1.7.2:

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.2.0)
  CPU: Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)

and on Ubuntu I was using 1.9.2:

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 Γ— Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 16 virtual cores

The package status output is not necessarily very informative since I am using custom, nonreferenced packages on a private github repository.

Now the question prompted me to try using the same version of Julia (I should have tried this first!), and the good news is that the code runs just fine on 1.7.2 on both systems! I will try to investigate exactly at which version things changed (and if I can reproduce the problem on MacOS too).

Note that calling GC.gc() did not change much the behaviour on Julia 1.9.2 (I did not test exactly if anything changed at all, but I got the same OOM error). Changing to single core operation also produced the same problem…

In any case, thanks for your help!

1 Like

Can you try to add the following code to those of your functions that allocate a lot:

if Sys.free_memory()/2^30 < 6.0
    GC.gc()
end

This solved a similar issue for me. See also: OOM despite `--heap-size-hint` Β· Issue #50658 Β· JuliaLang/julia Β· GitHub

If this does not solve the issue for you it might be glibc is optimized for memory allocations microbenchmarks Β· Issue #42566 Β· JuliaLang/julia Β· GitHub

2 Likes

Yes, sort of good, but then a regression on the supported 1.9.x. You could try on master or the just released 1.10-beta1. I believe there’s some good work done on the GC, in on master, maybe both. Julia 1.7.2 will of course still work even if no longer officially supported. If you insist on a supported version then you could also try Julia 1.6 LTS. It’s technically still claimed supported, though I believe 1.10 will be the next LTS, possibly soon, and 1.6 then dropped closely after.

I would at least try master (if you want to help Julia development to confirm works there) since I also see:

I’m having the same issue since last week, my code runs fine on macos (it uses less than 5G ram) while the ram explodes on ubuntu.
One of my colleagues seems to also have this issue on a completely different code.

Which Julia version? Did you enable zram on Ubuntu?

sudo apt install zram-config

I’m on v1.9.3. Switching to v1.10.0-beta2 seems to fix the issue.

2 Likes