A number of posts have been asking for a CSV chunk reader and the new major feature for DataConvenience is reasonably fast chunk reader based on CSV.jl.
See GitHub - xiaodaigh/DataConvenience.jl: Convenience functions missing in Julia
CSV Chunk Reader
You can read a CSV in chunks and apply logic to each chunk. The types of each column is inferred by CSV.read
.
for chunk in CsvChunkIterator(filepath)
# chunk is a DataFrame # do something to df
end
The chunk iterator uses CSV.read
parameters. The user can pass in type
and types
to dictate the types of each column e.g.
# read all column as String
for chunk in CsvChunkIterator(filepath, type=String)
# df is a DataFrame where each column is String # do something to df
end
# read a three colunms csv where the column types are String, Int, Float32
for chunk in CsvChunkIterator(filepath, types=[String, Int, Float32])
# do something to df
end