Reading a few rows from a BIG CSV file

dlakelan · September 23, 2021, 7:37pm

Ok, I’m trying this:


df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = [typeof(df1[1,i]) for i in 1:size(df,2)]

df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame

That failed when it got to some rows that didn’t have correctly detected types. I’m manually setting a few of the types and continuing… will see what happens.

Ok this worked!


df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = vcat([String,String],[typeof(df1[1,i]) for i in 3:size(df1,2)])

df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame

Took 118 seconds and produced 136k rows… so I guess that’s the solution.

Topic		Replies	Views
Skipping a lot of lines in CSV.read() allocates too much memory Performance csv , io	77	2050	February 23, 2024
Failing to import (relatively) large CSV file with Julia and VSC Data performance , csv , arrow	24	752	September 22, 2024
CSV read in is too slow than other language General Usage performance	13	1358	June 21, 2023
Reading Data Is Still Too Slow Data	35	8815	August 2, 2019
CSV read performance vs Pandas General Usage	29	8149	May 6, 2019

Reading a few rows from a BIG CSV file

Related topics