I’m trying to append data to a dataframe through nested loop something similar to the following example, but the resulting dataframe is empty. How can I fix this ?
I’ll be calling another function instead of DataFrame that has argument year and month, which doesn’t have the issue.
df = DataFrame()
for year in 1993:2020, month in 1:12
append!(df,DataFrame(A=year,B=month))
end
df
To make it more precise : both append! and push! are OK, but push! will be faster. Here is a comparison:
julia> function f1()
df = DataFrame()
for year in 1993:2020, month in 1:12
append!(df, DataFrame(A=year,B=month))
end
return df
end
f1 (generic function with 1 method)
julia> function f2()
df = DataFrame()
for year in 1993:2020, month in 1:12
push!(df, (A=year,B=month))
end
return df
end
f2 (generic function with 1 method)
julia> function f3()
df = DataFrame(A=Int[], B=Int[])
for year in 1993:2020, month in 1:12
push!(df, (year,month))
end
return df
end
f3 (generic function with 1 method)
julia> @time f1();
0.001682 seconds (13.13 k allocations: 1020.641 KiB)
julia> @time f2();
0.000212 seconds (1.71 k allocations: 81.266 KiB)
julia> @time f3();
0.000123 seconds (378 allocations: 23.922 KiB)
As a side note - thanks to @Ronis_BR it is extremely easy visually to check which version of DataFrames.jl someone is working on (as this matters in some cases - not this time fortunately).
This is probably not related to DataFrames, but why push and append are any different in this case? If one of those are faster, shouldn’t the methods called at the end be the same?
push! is for single elements, in this case pushing a “row” (NamedTuple) to a collection of rows (DataFrame). append! is for concatenating collections (DataFrames), so you can append! two data frames.
The issue is not the speed of the append! it’s the cost of constructing a DataFrame each time in the loop, which you have to do in order to append!.