Vcat list of dataframes in for loop

FredC · October 22, 2019, 8:30pm

I’m attempting something like

out = DataFrame()
for f in readdir(“csvs”)
vcat(out,CSV.read(f))
end

out is empty at the end of the for loop even though everything seems to be working.

Context is that I’m trying to memory-efficiently load a very large dataset that has been chunked into multiple csv files.

nilshg · October 22, 2019, 8:37pm

You probably need to assign the result of vcat like

out = vcat(out, CSV.read(f))

FredC · October 22, 2019, 8:41pm

Thanks. That doesn’t work in a for loop but using a function to control scope that does the trick.

If there are better ways to handle large data I’d be interested. This is my fallback after JuliaDB loadndsparse totally failed.

FredC · October 22, 2019, 8:53pm

This may be impractically slow, however… it seems to be getting slower and slower, and memory may be starting to bloat…

Daniel_Berge · October 22, 2019, 9:14pm

This can be done with a mapreduce.

out = mapreduce(vcat, readdir(“csvs”)) do f
    CSV.read(f)
end

Topic		Replies	Views
Vcat and plot behavior inside Julia script General Usage	5	644	May 8, 2021
How can I do list comprehension for vcat? General Usage question	10	914	July 14, 2022
Mutating version of vcat for data frames New to Julia dataframes	7	610	October 11, 2022
Iterating over cols, vcat, repeat? Performance	7	881	October 2, 2018
Vcat multiple DataFrames General Usage	1	2154	May 19, 2021