Reading a few rows from a BIG CSV file

Ok, I’m trying this:


df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = [typeof(df1[1,i]) for i in 1:size(df,2)]

df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame

That failed when it got to some rows that didn’t have correctly detected types. I’m manually setting a few of the types and continuing… will see what happens.

Ok this worked!


df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = vcat([String,String],[typeof(df1[1,i]) for i in 3:size(df1,2)])

df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame

Took 118 seconds and produced 136k rows… so I guess that’s the solution.

4 Likes