using DataFrames
dflong = DataFrame()
for i = 1:3
df = DataFrame(a = rand(i))
vcat(dflong, df)
end
I understand that this doesnβt work for two reasons:
dflong cannot be modified inside the local for scope
Even if it could, dflong and df have a different number of columns.
I have devised a solution that works, but seems very ugly, inelegant, and perhaps inefficient:
using DataFrames
dflong = DataFrame()
first = true
for i = 1:3
df = DataFrame(a = rand(i))
global dflong
global first
if first
dflong = similar(df, 0)
first = false
else
dflong = vcat(dflong, df)
end
end
Can you suggest a better way to do this?
I am new to Julia so probably just not getting something basic here about the proper way to adapt to for loops with local scope.
Thanks this is helpful.
In practice (outside of my simple example) I would like to do many operations inside the for loop before concatenating the data frame, so that I canβt use a constructor.
Whatβs a good solution for those types of situations?
The solution is correct, but I have some minor additional notes.
reduce(vcat, [DataFrame(a = rand(i)) for i in 1:5])
is only minimally faster than
vcat([DataFrame(a = rand(i)) for i in 1:5]...)
(the change was merged yesterday to master and has not been released yet (earlier splatting was the recommended approach).
Also creating intermediate data frames is not efficient. The recommended way to add rows to a data frame is:
using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
push!(dflong, (rand(i),))
end
(you can read the documentation of push! to find the accepted types of rows, in particular you can push! a NamedTuple, a dictionary, a vector or a tuple)
If you really have to create intermediate DataFrames then you can also do it with append! which will also be relatively fast (and you do not have to store all the data frames in the memory before vcat-ing):
using DataFrames
dflong = DataFrame(a=Float64[])
for i = 1:3
append!(dflong, DataFrame(a=rand(i)))
end
This is a situation where allowing push! and append! to add new columns if the data frame has zero columns would be convenient. Not sure whether that justifies this exception.
You should also be able to vcat a DataFrame with a Dict provided the symbols are the same as the DataFrameβs columns. Since a Dict is lighter weight (I think) this might be a solution depending on the details of your problem.
append! should be OK, but push! is problematic, because:
if what we push is a vector/tuple we do not have column names
if what we push is a dict/named tuple the current behavior of push! is to add only a selection of columns that already exist in a DataFrame, so we would add no columns.