OutOfMemoryError() instead of allocating too much resources in the job scheduler

Hi everyone,
I’ve encountered a few times that my code running interactively on an LSF cluster reaches the maximum of the (by me) allocated memory. This memory is not the memory limit of the node and instead of Julia throwing an OutOfMemoryError(), as it would if the whole node is allocated or it was running on a desktop machine, the job scheduler terminates the job due to excessive memory allocation.

Is there a way to feed a maximum memory into Julia, e.g. an environment variable JULIA_MAX_MEMORY in analogy to JULIA_NUM_THREADS, to avoid a termination of the job?

Thanks for any hints.

3 Likes

I do this as I described in an old post here. In addition just add something like

# MEMORY CONSUMPTION TEST
function memOK( mb::Int, memlimit::Int )
    if( mb > memlimit )
        println("### Memory-usage too high: $mb (MB)")
        return false
    else
        return true
    end
end

and terminate your program if the latter function returns false.

1 Like

Thanks for the reference, I didn’t find that. This is already interesting and I will check if it makes sense to use it in my case :slight_smile:

Wait, I found a mistake in the reference I’ve linked. You need to close the file in the referenced function. Otherwise you may get an error message like “too many files open” if you call that function very often. For convenience, here is the repaired version:

function get_mem_use()
    f::IOStream         = open( "/proc/self/stat", "r" )
    s::AbstractString   = read( f, String )
    vsize::Int          = parse( Int64, split( s )[23] )
    mb::Int             = Int( ceil( vsize / ( 1024 * 1024 ) ) )
    close(f)
    return mb::Int
end

I use this mainly in callbacks in JuMP.jl. Clearly, you cannot avoid all OutOfMemoryError() with this method if you put your “julia memlimit” too close to you physical limit. However, for me it works fine and I really have this errors very rarely now.

5 Likes

There is no such directory of file.

So, how can that ever work?

This works on Linux, see here.