The 4 min were to read a CSV file with a 40_000 x 40_000 matrix of floats (30 GB on disk, 6.4 GB on RAM).
Defining the type as Float64 in CSV.File didn’t help in my case.
I will try reading to a matrix tomorrow, instead of a data frame.
The 4 min were to read a CSV file with a 40_000 x 40_000 matrix of floats (30 GB on disk, 6.4 GB on RAM).
Defining the type as Float64 in CSV.File didn’t help in my case.
I will try reading to a matrix tomorrow, instead of a data frame.
How large such an array is in .csv format depends on the number of digits you save, and on the number of zeros.
Did you start Julia with julia -t auto
?
Thanks Uwe,
I was wrong about the conversion time from CSV to Arrow.
For the record this is what I got for the 30 GB CSV file (on disk) containing a 40_000 x 40_000 square matrix of Float64 (6.4 GB on RAM):
- CSV.File to Dataframe: 3.5 min
- CSV.File to Tables.matrix: 3.2 min
- CSV to arrow conversion: 4.9 min
- Arrow to Tables.matrix 10 s
Many thanks Uwe for your help.
It works perfectly now. The arrow conversion is the easiest and quickest for me.
Thank you again!
This solution is totally rad