[ANN] DataConvenience v0.1.2

xiaodai · April 19, 2020, 4:05pm

A number of posts have been asking for a CSV chunk reader and the new major feature for DataConvenience is reasonably fast chunk reader based on CSV.jl.

See GitHub - xiaodaigh/DataConvenience.jl: Convenience functions missing in Julia

CSV Chunk Reader

You can read a CSV in chunks and apply logic to each chunk. The types of each column is inferred by CSV.read .

for chunk in CsvChunkIterator(filepath) 
  # chunk is a DataFrame # do something to df
end

The chunk iterator uses CSV.read parameters. The user can pass in type and types to dictate the types of each column e.g.

# read all column as String 
for chunk in CsvChunkIterator(filepath, type=String) 
  # df is a DataFrame where each column is String # do something to df
end

# read a three colunms csv where the column types are String, Int, Float32 
for chunk in CsvChunkIterator(filepath, types=[String, Int, Float32]) 
  # do something to df
end

Topic		Replies	Views
How to read big data chunk by chunk(column-wise chunking)? General Usage question	5	2546	June 9, 2019
Specifying column type efficiently in CSV.read for large datasets General Usage	4	615	June 22, 2020
[ANN] New CSV.jl 0.5 Release Package Announcements data , csv	18	5078	October 20, 2019
CSV woes and SubString documentation New to Julia	9	1488	December 24, 2017
Handle large csv file using `enumerate(CSV.File())` or `CSV.read()`? New to Julia	3	551	April 21, 2019

[ANN] DataConvenience v0.1.2

CSV Chunk Reader

Related topics