Best way to iteratively add to a DataFrame?

torgo · February 22, 2019, 3:56pm

I would like to do something like this:

using DataFrames
dflong = DataFrame()
for i = 1:3
    df = DataFrame(a = rand(i))
    vcat(dflong, df)
end

I understand that this doesn’t work for two reasons:

dflong cannot be modified inside the local for scope
Even if it could, dflong and df have a different number of columns.

I have devised a solution that works, but seems very ugly, inelegant, and perhaps inefficient:

using DataFrames

dflong = DataFrame()
first = true

for i = 1:3
    df = DataFrame(a = rand(i))

    global dflong
    global first

    if first
        dflong = similar(df, 0)
        first = false
    else
        dflong = vcat(dflong, df)
    end
end

Can you suggest a better way to do this?
I am new to Julia so probably just not getting something basic here about the proper way to adapt to for loops with local scope.

kristoffer.carlsson · February 22, 2019, 4:39pm

> reduce(vcat, [DataFrame(a = rand(i)) for i in 1:5])
> 15×1 DataFrame
│ Row │ a         │
│     │ Float64   │
├─────┼───────────┤
│ 1   │ 0.0250787 │
│ 2   │ 0.144394  │
│ 3   │ 0.216657  │
│ 4   │ 0.761747  │
│ 5   │ 0.351675  │
│ 6   │ 0.284681  │
│ 7   │ 0.106181  │
│ 8   │ 0.551472  │
│ 9   │ 0.523894  │
│ 10  │ 0.51445   │
│ 11  │ 0.587754  │
│ 12  │ 0.878151  │
│ 13  │ 0.985698  │
│ 14  │ 0.504822  │
│ 15  │ 0.788035  │

torgo · February 22, 2019, 4:50pm

Thanks this is helpful.
In practice (outside of my simple example) I would like to do many operations inside the for loop before concatenating the data frame, so that I can’t use a constructor.
What’s a good solution for those types of situations?

bkamins · February 22, 2019, 4:52pm

The solution is correct, but I have some minor additional notes.

reduce(vcat, [DataFrame(a = rand(i)) for i in 1:5])

is only minimally faster than

vcat([DataFrame(a = rand(i)) for i in 1:5]...)

(the change was merged yesterday to master and has not been released yet (earlier splatting was the recommended approach).

Also creating intermediate data frames is not efficient. The recommended way to add rows to a data frame is:

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    push!(dflong, (rand(i),))
end

(you can read the documentation of push! to find the accepted types of rows, in particular you can push! a NamedTuple, a dictionary, a vector or a tuple)

If you really have to create intermediate DataFrames then you can also do it with append! which will also be relatively fast (and you do not have to store all the data frames in the memory before vcat-ing):

using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
    append!(dflong, DataFrame(a=rand(i)))
end

nalimilan · February 22, 2019, 7:20pm

This is a situation where allowing push! and append! to add new columns if the data frame has zero columns would be convenient. Not sure whether that justifies this exception.

pdeffebach · February 23, 2019, 12:44am

You should also be able to vcat a DataFrame with a Dict provided the symbols are the same as the DataFrame’s columns. Since a Dict is lighter weight (I think) this might be a solution depending on the details of your problem.

bjarthur · February 23, 2019, 2:22am

push!ing onto TypedTables is possible with an issue i created.

bkamins · February 23, 2019, 6:04am

append! should be OK, but push! is problematic, because:

if what we push is a vector/tuple we do not have column names
if what we push is a dict/named tuple the current behavior of push! is to add only a selection of columns that already exist in a DataFrame, so we would add no columns.

nalimilan · February 23, 2019, 10:03am

Yeah, that would only work when pushing a named tuple or DataFrameRow…

Topic		Replies	Views
Help to set properly the scope of a dataframe: want it to be local to a function but not local within each interation of a for-loop inside that function New to Julia	2	240	October 14, 2022
Iterating over a DataFrame New to Julia iterative , dataframes , function	2	703	May 26, 2021
Why can't I use hcat in a for-loop? General Usage dataframes	9	204	February 15, 2025
Sequentially add data to a DataFrame New to Julia question , dataframes	4	765	January 9, 2025
Is there a better way to do this? many calculated columns General Usage	6	491	September 24, 2020

Best way to iteratively add to a DataFrame?

Related topics