Vcat list of dataframes in for loop

I’m attempting something like

out = DataFrame()
for f in readdir(“csvs”)
vcat(out,CSV.read(f))
end

out is empty at the end of the for loop even though everything seems to be working.

Context is that I’m trying to memory-efficiently load a very large dataset that has been chunked into multiple csv files.

You probably need to assign the result of vcat like

out = vcat(out, CSV.read(f))

Thanks. That doesn’t work in a for loop but using a function to control scope that does the trick.

If there are better ways to handle large data I’d be interested. This is my fallback after JuliaDB loadndsparse totally failed.

This may be impractically slow, however… it seems to be getting slower and slower, and memory may be starting to bloat…

This can be done with a mapreduce.

out = mapreduce(vcat, readdir(“csvs”)) do f
    CSV.read(f)
end
1 Like