FredC
1
I’m attempting something like
out = DataFrame()
for f in readdir(“csvs”)
vcat(out,CSV.read(f))
end
out is empty at the end of the for loop even though everything seems to be working.
Context is that I’m trying to memory-efficiently load a very large dataset that has been chunked into multiple csv files.
nilshg
2
You probably need to assign the result of vcat like
out = vcat(out, CSV.read(f))
FredC
3
Thanks. That doesn’t work in a for loop but using a function to control scope that does the trick.
If there are better ways to handle large data I’d be interested. This is my fallback after JuliaDB loadndsparse totally failed.
FredC
4
This may be impractically slow, however… it seems to be getting slower and slower, and memory may be starting to bloat…
This can be done with a mapreduce
.
out = mapreduce(vcat, readdir(“csvs”)) do f
CSV.read(f)
end
1 Like