Hello again. I’m testing reading a large CSV using CSVFiles instead of CSV.jl because the CSV file is too large for CSV.jl on Windows. But I can not use the CSVFile as a iterable. I’m trying:
function mylength(iter)
n=0
for i in iter
n+=1
end
return n
end
function test(src:: String)
table = load(File(format"CSV", src);
colnames=[:day, :glnprovider, :glnretailerlocation, :gtin, :inventory, :cost, :sales, :price],
colparsers=[Date, UInt64, UInt64, UInt64, Float32, Float32, Float32, Float32],
header_exists=false
)
return table |> mylength
end
test("D:\\Data\\03012019_03312019_17440.csv.gz")
I’m hoping to use the table as iterable to calculate diferent things in streaming. But, i’m getting the error:
MethodError: no method matching iterate(::CSVFiles.CSVFile)
So, i’m found the document developer guide from iterable tables. In that document, the author says that we can call the method getiterator
. But when i tried that, the error is:
UndefVarError: getiterator not defined
So, how can i use a iterable table (the CSV file) as a iterator? getting the iterator somehow?
getiterator
is defined in IteratorInterfaceExtensions.jl, so you need to load that package.
But be warned: CSVFiles.jl currently reads everything into memory, and then iterates from that. So if load("foo.csv") |> DataFrame
doesn’t work because of memory limitations, then using getiterator
will probably also not work (still worth a try, of course).
If you don’t need all of the columns of the file, you can try the new skip column feature that I’ve added to TextParse#master
: make sure you are using that (pkg> add TextParse#master
), and then something like load("foo.csv", colparsers=Dict(:colA=>nothing, :colC=>nothing)) |> DataFrame
should work. In that case, colA
and colB
are not being loaded into memory at all.
My next project is to integrate the skip column feature with Query.jl, so that something like load("foo.csv") |> @select(-:colA, -:colC) |> DataFrame
automatically skips those columns during load. No promise on timing, though
EDIT: Oh, and I also plan to add a fully streaming mode at some point. Almost all the pieces for that exist already, so it actually shouldn’t be too difficult, but again, no timeline right now.
3 Likes
Thanks for your time. I’ll check just in case.
Great, any feedback on whether it works would be most welcome