"memory mapping failed" when reading many CSVs

I use CSV.jl for reading many small CSV files, and at some point always encounter the following: “ERROR: SystemError: memory mapping failed: Cannot allocate memory”. Of course, there is plenty of free memory at the time. An easy way to reproduce is the following:

using CSV, DataFrames

# create CSV file
CSV.write("tmp.csv", DataFrame(a=1:1000, b=1:1000))

# read it many times
[CSV.read("tmp.csv", use_mmap=false) for i in 1:1000];
# running the previous line several times reliably gives the error for me - no matter if `use_mmap` is true or false

I already submitted this as a bug report to CSV.jl, but maybe this is a deeper issue in how memmap works in Julia? I’m not an expert in system programming, so it’s hard to tell.

Linux, Julia 1.2.0, CSV 0.5.11.

1 Like

You can try CSVFiles.jl:

using CSVFiles, DataFrames

[DataFrame(load("tmp.csv")) for i in 1:1000]

I can’t replicate it on Windows. I have set i in 1:100_000 and still no error. I usually gives an error on Windows if I read a large file so large that it uses up all my RAM (64G).

I don’t have a windows machine with julia installed, but the error reproduces for two completely different linux machines, with different julia versions (1.2 and 1.3).

Your user memory limits are probably too low, you (or your admin) can try raising them with the ulimit command to see if that helps.

Which limit are you talking about? ulimit output looks reasonable to me:

# machine 1:
➜  ~ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       unlimited
-n: file descriptors                1048576
-l: locked-in-memory size (kbytes)  64
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 256907
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

# machine 2
➜  ~ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       79276
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  65536
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 79276
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

Also, no matter what the limits are, if it can read one CSV file then it should be able to read however many of those fit into RAM without issues, right? The machines have 20G and 128G of RAM.

Hmm maybe I’m misremembering where the appropriate mmap limits are (maybe in security.conf?). Either way, my bet on this being a resource limit issue rather than a CSV.jl issue, since you still technically have free memory available.

Ok, turns out there is indeed a mmap limit:

➜  ~ sudo sysctl vm.max_map_count
vm.max_map_count = 65530

After increasing it 10 times with

➜  ~ sudo sysctl -w vm.max_map_count=655300
vm.max_map_count = 655300

I cannot reproduce that error anymore, even with way more CSV reads. So, my particular problem seems to be solved.

However, I think it’s still a CSV.jl issue for two reasons:

  • It doesn’t clean up (?) mmaped regions correctly, because otherwise it should use the same amount of mmap resources no matter if I read 1 file or 10’000 - reads are completely independent.
  • I explicitly specify use_mmap = false, so it shouldn’t depend on mmap at all, right?
4 Likes

Just contributing a bit more information.

I just faced the same issue with Julia 1.2 (Linux), CSV v0.5.12 and I can confirm that aplavin’s solution to increase vm.max_map_count does indeed work.

BTW that “bug” did not occur back in April 2019 when I wrote that specific piece of code. Unfortunately, I am not exactly sure which versions of Julia or CSV.jl I was using at that point, most likely the latest stable versions at the time. Otherwise my linux system was pretty much the same except for the distro’s updates (Arch Linux) since then.

CSV.jl uses mmap in two ways: 1) to map the CSV file into memory and read from it. You can disable that via use_mmap=false, 2) it allocates one new mmap’ed vector for each column you read, and a reference to each of these mmaped vectors is kept from say a DataFrame that you materialize into. Maybe you can avoid that by using copycols=true, but I’m not sure. In any case, I bet that this second part is the source of the problem here. It is not clear to me why one would do 2).

CSVFiles.jl only uses memory mapped files for mapping the actual CSV file into memory (i.e. what CSV.jl uses as 1)), and I’ve never had any problem with that.

Hi,

I just come to face the problem reported here with only one “small” table of 62 Mo (4001×4933). The very strange thing is that I did this test on 3 computers :
1- Linux Manjaro with 8 Go RAM
2- Linux Manjaro with 64 Go RAM
3- A supercomputer node with 192 Go RAM (CentOS if I remember).

In all 3 computers I have the last Julia (1.4.1) with the packages up to date. And the very strange result is that I cannot read the small table on all computers except the small one (see results below - I cannot share the file it contains data, it is a dataframe of floats (mainly 0.0) with the first row, the first column containing strings and the last column of Integers )

I did not try the solution reported here because my goal is to use the software on a supercompter without root access.

Disabling mmap does not solve anything.

I gave a try to CSVFiles, unfortunately the memory it takes is much more important than CSV (more than 2X) and my small computer start to swap and I have to kill Julia :frowning:

I think it is a serious problem because my table is not so large and in a computer without root access there is currently no solution. The problem is difficult to observe because it is not directly related to the size of the table, and can be observed in some computers and not others, even if the package versions are the same…

Here are the results of my tests. I hope it will help.

On the computer Linux Manjaro with 64 Go RAM :
(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
  [336ed68f] CSV v0.6.2
  [861a8166] Combinatorics v1.0.1
  [a93c6f00] DataFrames v0.21.0

julia> x = CSV.read(file; delim ="\t", header=true, use_mmap=false)
ERROR: TaskFailedException:
SystemError: memory mapping failed: Ne peut allouer de la mémoire
Stacktrace:
 [1] systemerror(::String, ::Int32; extrainfo::Nothing) at ./error.jl:168
 [2] #systemerror#50 at ./error.jl:167 [inlined]
 [3] systemerror at ./error.jl:167 [inlined]
 [4] mmap(::Mmap.Anonymous, ::Type{Array{UInt64,1}}, ::Tuple{Int64}, ::Int64; grow::Bool, shared::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:209
 [5] #mmap#14 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [6] mmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [7] parsetape(::Val{false}, ::Int64, ::Dict{Int8,Int8}, ::Array{Array{UInt64,1},1}, ::Array{Array{UInt64,1},1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{Dict{String,UInt64},1}, ::Array{UInt64,1}, ::Int64, ::Array{Int8,1}, ::Array{Int64,1}, ::Bool, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing) at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:474
 [8] macro expansion at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:313 [inlined]
 [9] (::CSV.var"#39#42"{Array{Int8,1},Array{UInt8,1},Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Nothing,Float64,Int64,Dict{Int8,Int8},Int64,Bool,Int64,Array{Int64,1},Int64,Array{Int64,1},Array{Array{Array{UInt64,1},1},1},Array{Array{Array{UInt64,1},1},1},Array{Array{Dict{String,UInt64},1},1},Array{Array{UInt64,1},1},Array{Array{Int8,1},1},Array{Array{Int64,1},1},Int64})() at ./threadingconstructs.jl:126

...and 20 more exception(s).

x = CSV.read(file; delim ="\t", header=true)
ERROR: TaskFailedException:
SystemError: memory mapping failed: Ne peut allouer de la mémoire
Stacktrace:
 [1] systemerror(::String, ::Int32; extrainfo::Nothing) at ./error.jl:168
 [2] #systemerror#50 at ./error.jl:167 [inlined]
 [3] systemerror at ./error.jl:167 [inlined]
 [4] mmap(::Mmap.Anonymous, ::Type{Array{UInt64,1}}, ::Tuple{Int64}, ::Int64; grow::Bool, shared::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:209
 [5] #mmap#14 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [6] mmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [7] parsetape(::Val{false}, ::Int64, ::Dict{Int8,Int8}, ::Array{Array{UInt64,1},1}, ::Array{Array{UInt64,1},1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{Dict{String,UInt64},1}, ::Array{UInt64,1}, ::Int64, ::Array{Int8,1}, ::Array{Int64,1}, ::Bool, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing) at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:469
 [8] macro expansion at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:313 [inlined]
 [9] (::CSV.var"#39#42"{Array{Int8,1},Array{UInt8,1},Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Nothing,Float64,Int64,Dict{Int8,Int8},Int64,Bool,Int64,Array{Int64,1},Int64,Array{Int64,1},Array{Array{Array{UInt64,1},1},1},Array{Array{Array{UInt64,1},1},1},Array{Array{Dict{String,UInt64},1},1},Array{Array{UInt64,1},1},Array{Array{Int8,1},1},Array{Array{Int64,1},1},Int64})() at ./threadingconstructs.jl:126

...and 22 more exception(s).

julia> x = CSV.read(file; delim ="\t")
ERROR: LLVM ERROR: out of memory

signal (6): Abandon
in expression starting at REPL[3]:0
LLVM ERROR: out of memory

signal (6): Abandon
in expression starting at REPL[3]:0
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
LLVM ERROR: out of memory



On the computer Linux Manjaro with only 8 Go RAM :
(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
  [336ed68f] CSV v0.6.2
  [5d742f6a] CSVFiles v0.16.0
  [861a8166] Combinatorics v1.0.1
  [a93c6f00] DataFrames v0.21.0
  [31c24e10] Distributions v0.22.6
  [09f84164] HypothesisTests v0.10.0
  [b1bec4e5] LIBSVM v0.4.0
  [2fda8390] LsqFit v0.10.0
  [5fb14364] OhMyREPL v0.5.5
  [91a5bcdd] Plots v1.2.2
  [1a8c2f83] Query v0.12.2
  [ce6b1742] RDatasets v0.6.1
  [f2b01f46] Roots v1.0.1
  [b8865327] UnicodePlots v1.2.0


julia> x = CSV.read(file; delim ="\t")
4001×4933 DataFrame. Omitted printing of 4922 columns
1 Like

Hi !

I found the source of the bug : it is not in mmap, it is in multithreading (and probably hyperthreading, because my small computer does not have HT, whereas the 2 others have). So a temporary solution is to disable multithreading using
threaded=false
with this option it possible to read the table on any computer :slight_smile:

On the computer Linux Manjaro with 64 Go RAM :
(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
  [336ed68f] CSV v0.6.2
  [861a8166] Combinatorics v1.0.1
  [a93c6f00] DataFrames v0.21.0

x = CSV.read(file; delim ="\t", threaded=false)
4001×4933 DataFrame. Omitted printing of 4922 columns

x = CSV.read(file; delim ="\t", threaded=true)
ERROR: TaskFailedException:
SystemError: memory mapping failed: Ne peut allouer de la mémoire
Stacktrace:
 [1] systemerror(::String, ::Int32; extrainfo::Nothing) at ./error.jl:168
 [2] #systemerror#50 at ./error.jl:167 [inlined]
 [3] systemerror at ./error.jl:167 [inlined]
 [4] mmap(::Mmap.Anonymous, ::Type{Array{UInt64,1}}, ::Tuple{Int64}, ::Int64; grow::Bool, shared::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:209
 [5] #mmap#14 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [6] mmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
 [7] parsetape(::Val{false}, ::Int64, ::Dict{Int8,Int8}, ::Array{Array{UInt64,1},1}, ::Array{Array{UInt64,1},1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{Dict{String,UInt64},1}, ::Array{UInt64,1}, ::Int64, ::Array{Int8,1}, ::Array{Int64,1}, ::Bool, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing) at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:469
 [8] macro expansion at /home/fred/.julia/packages/CSV/vyG0T/src/file.jl:313 [inlined]
 [9] (::CSV.var"#39#42"{Array{Int8,1},Array{UInt8,1},Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Nothing,Float64,Int64,Dict{Int8,Int8},Int64,Bool,Int64,Array{Int64,1},Int64,Array{Int64,1},Array{Array{Array{UInt64,1},1},1},Array{Array{Array{UInt64,1},1},1},Array{Array{Dict{String,UInt64},1},1},Array{Array{UInt64,1},1},Array{Array{Int8,1},1},Array{Array{Int64,1},1},Int64})() at ./threadingconstructs.jl:126

...and 21 more exception(s).
1 Like