I’m using CSV.jl to iterate over a very large CSV, but in Windows it’s throwing the error:
Error: could not create file mapping: The operation comleted successfully
The same error that this issue describe:
https://github.com/JuliaData/CSV.jl/issues/424
The error appears with CSV.File and CSV.Rows, so i’m thinking that in Windows is not possible to iterate over very large CSV’s. Do exists another option to do that task?
I don’t need type inference, i can specify the name/type of every column, and the separator if necesary.
using Pkg
Pkg.add("CodecZlib")
Pkg.add("CSV")
Pkg.add("Queryverse")
using Query, CSV, CodecZlib, Dates
function mylength(iter)
n=0
for i in iter
n+=1
end
return n
end
function test(src:: String)
open(GzipDecompressorStream, src) do stream
table = CSV.Rows(stream;
header=[:day, :glnprovider, :glnretailerlocation, :gtin, :inventory, :cost, :sales, :price],
dateformat="yyyy-mm-dd",
types=[Date, UInt64, UInt64, UInt64, Float32, Float32, Float32, Float32],
strict=true)
return table |> mylength
end
end
test("D:\\Data\\03012019_03312019_17440.csv.gz")
Iterating with CSV.Rows should have a low memory footprint and handle large files.
However if you have are having issues (limited memory?) then you could try streaming using basic primitives. In your case, it seems your CSV file has known types and is not complex to parse.
So perhaps you could try something like:
using Dates
const types = [Date, UInt64, UInt64, UInt64, Float32, Float32, Float32, Float32]
const testfile = "testfile.csv"
function testwrite(filename::AbstractString)
open(filename, "w") do io
println(io, "2019-08-23, 5, 123, 17, 13.5, 1200.34, 1500.80, 22.30")
println(io, "2019-08-22, 4, 122, 16, 12.5, 1100.34, 1400.80, 21.30")
end
end
function testread(filename::AbstractString)
nfields = length(types)
for line in eachline(filename)
strings = split(line, ",")
fields = ntuple(i -> parse(types[i], strings[i]), nfields)
println(fields)
# do stuff with fields
end
end
testwrite(testfile)
testread(testfile)
Thanks. I can try this, but probably i wil use another soluction (the idea is to teach the same tool for this type of task to others, and to be very easy is a possitive point)
I hoped this task was as easy as use a library. I’m thinking in Julia for use as ETL tool really, because it was easy to read, process and write data in streamming with higth performance. I suppose the ecosystem is not ready yet.