I would like to read a huge csv file around 9G and apply filters to the rows in order to build another dataframe carrying only the values I want to. I have installed the pack DataBase and I have just tried the simple command :
using JuliaDB
flights = loadtable("Document.csv")
and I got the following error:
OutOfMemoryError()
How I can deal with request of this type on Julia? Thanks in advance.
Unfortunately, handling large data in Julia is difficult at the moment. I would recommend opening a CSV.File, iterating through it, and only keeping what you need. Look at the docstrings in CSV.jl.
Thanks for your answer. I was wondering about your suggestion: I cannot use CSV.read because this function cannot neither deal with this huge file csv. Maybe I didn’t understand well your suggestion. Could you clarify it to me ? Thanks.
As @Tamas_Papp mentioned, using CSV.File(file; kw...) will return a CSV.File object which doesn’t load the entire dataset into memory. You could then “build up” a table by iterating over the rows and filtering as you’d like, something like:
function buildtable(filter::Function, file)
f = CSV.File(file)
# create a NamedTuple of Vectors to push! to
table = (colA=Int[], colB=Float64[], colC=String[])
for row in f
if filter(row)
push!(table.colA, row.colA)
push!(table.colB, row.colB)
push!(table.colC, row.colC)
end
end
return table
end