Ok, I’m trying this:
df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = [typeof(df1[1,i]) for i in 1:size(df,2)]
df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame
That failed when it got to some rows that didn’t have correctly detected types. I’m manually setting a few of the types and continuing… will see what happens.
Ok this worked!
df1 = CSV.read("psam_husa.csv",DataFrame,limit=1)
thetypes = vcat([String,String],[typeof(df1[1,i]) for i in 3:size(df1,2)])
df = Iterators.filter(x-> rand() < .2 && x[:NP] >= 3,CSV.Rows("psam_husa.csv";types=thetypes)) |> DataFrame
Took 118 seconds and produced 136k rows… so I guess that’s the solution.