DataFrame in Nested Loop

I’m trying to append data to a dataframe through nested loop something similar to the following example, but the resulting dataframe is empty. How can I fix this ?
I’ll be calling another function instead of DataFrame that has argument year and month, which doesn’t have the issue.

df = DataFrame()
for year in 1993:2020, month in 1:12
    append!(df,DataFrame(A=year,B=month))
end
df

It seems to work here:

julia> using DataFrames

julia> df = DataFrame()
0×0 DataFrame


julia> for year in 1993:2020, month in 1:12
           append!(df,DataFrame(A=year,B=month))
       end

julia> df
336×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1993  │ 1     │
│ 2   │ 1993  │ 2     │
│ 3   │ 1993  │ 3     │
│ 4   │ 1993  │ 4     │
│ 5   │ 1993  │ 5     │

2 Likes

Side note, you probably want push!


julia> df = DataFrame()
0×0 DataFrame


julia> for year in 1993:2020, month in 1:12
           push!(df,(A=year,B=month))
       end
2 Likes

To make it more precise :grinning_face_with_smiling_eyes: : both append! and push! are OK, but push! will be faster. Here is a comparison:

julia> function f1()
       df = DataFrame()
       for year in 1993:2020, month in 1:12
           append!(df, DataFrame(A=year,B=month))
       end
       return df
       end
f1 (generic function with 1 method)

julia> function f2()
       df = DataFrame()
       for year in 1993:2020, month in 1:12
           push!(df, (A=year,B=month))
       end
       return df
       end
f2 (generic function with 1 method)

julia> function f3()
       df = DataFrame(A=Int[], B=Int[])
       for year in 1993:2020, month in 1:12
           push!(df, (year,month))
       end
       return df
       end
f3 (generic function with 1 method)

julia> @time f1();
  0.001682 seconds (13.13 k allocations: 1020.641 KiB)

julia> @time f2();
  0.000212 seconds (1.71 k allocations: 81.266 KiB)

julia> @time f3();
  0.000123 seconds (378 allocations: 23.922 KiB)

(timings are after compilation)

2 Likes

And finally, just consider using the excellent Dates standard library for this:

julia> using Dates, DataFrames

julia> df = DataFrame(date = Date(1993,1):Month(1):Date(2020, 12));

julia> df.year = year.(df.date); df.month = month.(df.date);

julia> first(df, 5)
5×3 DataFrame
 Row │ date        year   month 
     │ Date        Int64  Int64 
─────┼──────────────────────────
   1 │ 1993-01-01   1993      1
   2 │ 1993-02-01   1993      2
   3 │ 1993-03-01   1993      3
   4 │ 1993-04-01   1993      4
   5 │ 1993-05-01   1993      5
3 Likes

As a side note - thanks to @Ronis_BR it is extremely easy visually to check which version of DataFrames.jl someone is working on (as this matters in some cases - not this time fortunately). :smiley:

2 Likes

This is probably not related to DataFrames, but why push and append are any different in this case? If one of those are faster, shouldn’t the methods called at the end be the same?

push! is for single elements, in this case pushing a “row” (NamedTuple) to a collection of rows (DataFrame). append! is for concatenating collections (DataFrames), so you can append! two data frames.

The issue is not the speed of the append! it’s the cost of constructing a DataFrame each time in the loop, which you have to do in order to append!.

5 Likes

thanks, I didn’t know dates could be used in this way. Also it would be faster compared to append! or push!.