Best way to iteratively add to a DataFrame?

bkamins · February 22, 2019, 4:52pm

The solution is correct, but I have some minor additional notes.

reduce(vcat, [DataFrame(a = rand(i)) for i in 1:5])

is only minimally faster than

vcat([DataFrame(a = rand(i)) for i in 1:5]...)

(the change was merged yesterday to master and has not been released yet (earlier splatting was the recommended approach).

Also creating intermediate data frames is not efficient. The recommended way to add rows to a data frame is:

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    push!(dflong, (rand(i),))
end

(you can read the documentation of push! to find the accepted types of rows, in particular you can push! a NamedTuple, a dictionary, a vector or a tuple)

If you really have to create intermediate DataFrames then you can also do it with append! which will also be relatively fast (and you do not have to store all the data frames in the memory before vcat-ing):

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    append!(dflong, DataFrame(a=rand(i)))
end

Topic		Replies	Views
Efficiently creating a data frame that is made up of smaller data frames Modelling & Simulations dataframes , for-loop	5	549	September 11, 2022
Appending rows to a dataframe is seemingly inconsistent and confusing Data	11	4731	December 24, 2021
How do I append (row-bind) a collection of DataFrames together into one? New to Julia data	1	1802	September 6, 2019
Mutating version of vcat for data frames New to Julia dataframes	7	611	October 11, 2022
DataFrame in Nested Loop New to Julia dataframes	8	1103	December 3, 2020

Best way to iteratively add to a DataFrame?

Related topics