Instead of OutOfMemory Error being thrown, my OS (Ubuntu) is killing the julia process

I am writing a process that loads file into memory, does some operations, and then moves to the next file. Some large files do not fit into my RAM, and I would like to catch the OOM exception and skip the file. However, my OS kills the julia process before it can handle the error.

I am using julia 1.6.0 and ubuntu 20.04. I have already tried setting the overcommit_memory linux parameter to 2 (always check if memory is available for allocation before allocating), but it made it impossible to open my browser or editor.

Out of memory is one of what Eric Lippert calls a fatal exception

Fatal exceptions are not your fault , you cannot prevent them , and you cannot sensibly clean up from them . They almost always happen because the process is deeply diseased and is about to be put out of its misery.

I am not sure I entirely agree, but it is indeed the case that you can’t reliably clean up from them, because the process is too “diseased” to reliably do anything, when in that state.

Something you can do, is isolate that into a separate process, and then recover from that other process being killed.
Odds are very high that the OOM killer will kill that one instead.

Parallelism.jl’s robust_pmap uses that trick.
I use that to achieve optimal parallelism on memory bound processes.
Start way too many processes then let the OOM killer off processes til you have enough memory.
(People say I am weird for considering the OOM killer a pal, but it is.)
You don’t want to literally use that, since it will keep retrying that file until all your workers have died, rather than skipping.
but you can build your own thing with Distributed.remote_call and catch ProcessExittedError.

Though perhaps more pragmatically you can use filesize to check the size of the file before you open it, and just don’t try to open files that are bigger than some hardcoded constant?

1 Like

Couldn’t the julia runtime theoretically do something about it, though? Monitor how close to the limit you are whenever you allocate, then when you get close, call GC(), unwind the stack, etc… Feels like there ought to be a way to recover enough memory so that the exception can be thrown “normally” (at least from the user POV).

Maybe theoretically, but practically very hard, and I think basically impossible to do reliably in all cases.
Especially since other unrelated processes could be taking up memory unexpectly as you do it.
Julia definately does to something like that in some cases.
e.g.

julia> [1 for ii in 1:2^40]
ERROR: OutOfMemoryError()

works pretty consistently.

I suspect for FileIO the OOM happens inside a ccall where the julia runtime has only limited ability to do anything.
(albeit a ccall hitting a function that is written as part of the julia stdlib)

1 Like

Reading the file as a string is not the problem, I am parsing it and loading it into a vector of closure functions. Fairly small files end up taking several dozen gigabytes of memory. Are closures particularly memory intensive?

It seems to me like this is fixable. Can you provide some more details?

1 Like

Closures are not particularly memory intensive.
But it is is easy to close over things you don’t intent to.
If your are not careful.
Say a closure calls size on a variable in parent scope, that will keep that variable in memory for the whole time.

For example:

julia> make_sizer(A) = ()->size(A)
make_sizer (generic function with 1 method)

julia> sizer = make_sizer(ones(1000,1000))
#9 (generic function with 1 method)

julia> Base.summarysize(sizer)
8000048

julia> sizer.A
1000Ă—1000 Matrix{Float64}:

It is more common than your might think a string uses less memory than it’s parsed form.
Consider:

julia> str = "1 2 3 4";

julia> Base.summarysize(str)
15

julia> xs = parse.(Int64, split(str))
4-element Vector{Int64}:
 1
 2
 3
 4

julia> Base.summarysize(xs)
72

Each number in the string takes 1 byte, + 1 byte for each space, plus some small overhead.
But once parsed into Int64, each number takes 8 bytes