I’m trying to read a large csv file, one row at a time.
I want to do something like:
using CSV
csvFile = CSV.File(infile)
for row in csvFile
# do stuff with row
end
However, on Windows 10, Julia v1.1.1, I’m getting this error during CSV.File
:
ERROR: could not create file mapping: The operation completed successfully.
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] #mmap#1(::Bool, ::Bool, ::Function, ::Mmap.Anonymous, ::Type{Array{UInt64,1}}, ::Tuple{Int64}, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:218
[3] #mmap at .\none:0 [inlined]
[4] #mmap#14 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:251 [inlined]
[5] mmap at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:251 [inlined]
[6] file(::String, ::Int64, ::Bool, ::Int64, ::Nothing, ::Int64, ::Int64, ::Bool, ::Nothing, ::Bool, ::Array{String,1}, ::String, ::Nothing, ::Bool, ::Char, ::Nothing, ::Nothing, ::Char, ::Nothing, ::UInt8, ::Nothing, ::Nothing, ::Nothing, ::Nothing, ::Dict{Int8,Int8}, ::Bool, ::Float64, ::Bool, ::Bool, ::Bool, ::Bool, ::Nothing) at \\W43237\C$\Users\plowman\.julia\packages\CSV\IwqOm\src\CSV.jl:278
[7] #File#20 at \\W43237\C$\Users\plowman\.julia\packages\CSV\IwqOm\src\CSV.jl:158 [inlined]
[8] Type at \\W43237\C$\Users\plowman\.julia\packages\CSV\IwqOm\src\CSV.jl:158 [inlined]
[9] top-level scope at .\REPL[5]:6 [inlined]
[10] top-level scope at .\none:0
This is interesting because:
- The error message says the operation completed successfully.
- On Windows, the CSV default is to not use mmap.
A little further investigation shows that the error occurs for large files only. The code below is successful for rows = 10^6, 10^7, 10^8, but fails for rows = 10^9.
cols = 3
for rows in (10^6, 10^7, 10^8, 10^9)
filename = string("test_", rows, "x", cols, ".csv")
df = DataFrame(rand(rows, cols))
println("Writing ", filename)
CSV.write(filename, df)
CSV.File(filename)
end