A number of posts have been asking for a CSV chunk reader and the new major feature for DataConvenience is reasonably fast chunk reader based on CSV.jl.
CSV Chunk Reader
You can read a CSV in chunks and apply logic to each chunk. The types of each column is inferred by
for chunk in CsvChunkIterator(filepath) # chunk is a DataFrame # do something to df end
The chunk iterator uses
CSV.read parameters. The user can pass in
types to dictate the types of each column e.g.
# read all column as String for chunk in CsvChunkIterator(filepath, type=String) # df is a DataFrame where each column is String # do something to df end
# read a three colunms csv where the column types are String, Int, Float32 for chunk in CsvChunkIterator(filepath, types=[String, Int, Float32]) # do something to df end