Weird error: `SystemError: memory mapping failed: Too many open files in system`

affans · June 23, 2020, 3:25pm

I am trying to run something in parallel. Suppose I have the following module loaded and that I am using ClusterManagers.jl to connect to a cluster.

@everywhere module ldf 
    using some packages...
    function read_df()
        validstates = ("...")
        #create a return vector
        df_of_states = Array{DataFrame, 1}(undef, 50)
        for (i, vs) in enumerate(validstates)
            df_of_states[i] = CSV.File(fn, header=false) |> DataFrame
        end 
        return df_of_states
    end
    export read_df

    const dfs = read_df()
    export dfs
end

The line const dfs = read_df() makes it so that when the @everywhere is run on all the workers, the csv files are read and stored in a const variable. I want to make use these dataframes in other functions. The size of each dataframes is 20000 x 141.

So now suppose on node001 where there are 32 workers launched, each worker with a separate copy of ldf, each creating a variable called dfs and reading those files… I get the following error

 On worker 2:
│    SystemError: memory mapping failed: Too many open files in system
│    #systemerror#51 at ./error.jl:168
│    #systemerror#50 at ./error.jl:167 [inlined]
│    systemerror at ./error.jl:167 [inlined]
│    #mmap#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:209
│    #mmap#14 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
│    mmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
│    parsetape at /home/affans/.julia/packages/CSV/vyG0T/src/file.jl:474
│    #File#28 at /home/affans/.julia/packages/CSV/vyG0T/src/file.jl:252
│    read_df at /home/affans/lancetid_actualinfection_post.jl:33
│    top-level scope at /home/affans/lancetid_actualinfection_post.jl:39
│    eval at ./boot.jl:331

It’s a fairly obvious error… I am opening up too many files, but I was wondering what the root cause of this is? It’s not out of memory error (as each worker has almost 4gb of RAM to work with, with a total of 128gb).

The solution for me here is to move the read_df() back to the head node and just pass to the workers the relevant data… but just wanted to get some insight on this.

pixel27 · June 23, 2020, 3:48pm

I’m assuming you are on linux. You might want to check /proc/sys/fs/file-max. That should be the maximum number of file descriptors can be open (in the system) at once. (On my system that is 9223372036854775807 but maybe it’s smaller on yours.)

You should also check /proc/sys/fs/file-nr the first number is the current number of file descriptors in use.

johnh · June 23, 2020, 7:36pm

Can you log into one of your compute nodes, or submit a simple batch job and run

ulimit -a

I just did this on my Fedora laptop
open files (-n) 1024

Topic		Replies	Views
Error opening too many connections Data	5	649	March 19, 2020
"memory mapping failed" when reading many CSVs General Usage	11	2068	May 8, 2020
Warning: macos 10.12.* bug with SharedArray: shm_open() failed Julia at Scale	3	1126	December 13, 2020
Error when using "too many" workers Julia at Scale	10	2117	April 18, 2018
Using Threads with I/O to processing many files in parallel New to Julia	3	936	December 23, 2016

Weird error: `SystemError: memory mapping failed: Too many open files in system`

Related topics