How to read big data chunk by chunk(column-wise chunking)?

quinnj · June 9, 2019, 12:19am

Yes, it’d be helpful if you provided some more details here: what kind of format is your data in? csv? feather? excel? something else? Why is processing > 100 columns at a time too big? I ask because on my 5-year-old laptop, I can process certain csv files with 20,000 columns without much trouble.

In the CSV.jl package, a recent addition is the CSV.Rows type, which allows efficient iteration, row-by-row, over the values in a csv file. It even allows a reusebuffer=true keyword argument that will allocate a single buffer for the entire file to be re-used while iterating. So you could process an entire file by doing something like:

for row in CSV.Rows(filename; reusebuffer=true)
    # do things with row values: row.col1, row.col2, etc. where `col1` is a column name in the csv file
end

Hope that helps?

Topic		Replies	Views
Need example of processing csv file in chunks New to Julia package , csv	3	1235	January 4, 2021
How can I split large data using a faster and more efficient function (data science)? New to Julia csv	9	851	October 27, 2022
What's the best way to work with millions of rows of data? Performance	7	2126	February 24, 2020
Partition a large CSV file into smaller files without loading into memory General Usage question , csv , io	6	3721	March 10, 2019
Reading and processing Data files concurrently Data parallel	18	3859	September 20, 2017

How to read big data chunk by chunk(column-wise chunking)?

Related topics