pandas we have the! cat where we can see the .csv before loading … julia has something similar
CSV.jl loads memory-mapped views of CSV’s by default. This means that if you do CSV.read("filename.csv")
most of the file is not yet loaded into memory. Better yet, this view behaves like a perfectly ordinary dataframe (because it is). For example
csv = CSV.read("filename.csv")
first(csv, 6)
will give you the first 6 lines of the table in the CSV, only reading whatever data is necessary. This means that even if you did this on a 20 GB CSV file, it would take about the same amount of time.
2 Likes