Best way to iteratively add to a DataFrame?

The solution is correct, but I have some minor additional notes.

reduce(vcat, [DataFrame(a = rand(i)) for i in 1:5])

is only minimally faster than

vcat([DataFrame(a = rand(i)) for i in 1:5]...)

(the change was merged yesterday to master and has not been released yet (earlier splatting was the recommended approach).

Also creating intermediate data frames is not efficient. The recommended way to add rows to a data frame is:

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    push!(dflong, (rand(i),))
end

(you can read the documentation of push! to find the accepted types of rows, in particular you can push! a NamedTuple, a dictionary, a vector or a tuple)

If you really have to create intermediate DataFrames then you can also do it with append! which will also be relatively fast (and you do not have to store all the data frames in the memory before vcat-ing):

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    append!(dflong, DataFrame(a=rand(i)))
end
5 Likes