DataFramesMeta release thread

Release announcements of DataFramesMeta

Hello everyone! I am please to announce that yesterday DataFramesMeta had it’s 0.6.0 release, a breaking change from the previous version of 0.5.1. This post describes the new release. This thread will sevre as a place where we announce future releases as well.

Breaking changes:

  • @byrow! is deprecated in favor of @eachrow. This was done for two reasons. First, @byrow! is a bad name because it actually returns a fresh data frame rather than modifying the input. Second, we would like to leave the @byrow open to mirror DataFrames’s ByRow function wrapper in a future release. Usage is
julia> using DataFramesMeta

julia> df = DataFrame(a = [1, 2]; b = [3, 4]);

julia> @eachrow df begin 
       :a = :b * 100
       end
2Γ—2 DataFrame
 Row β”‚ a      b     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚   300      3
   2 β”‚   400      4
  • @where with a GroupedDataFrame now selects rows, not groups. Previously, @where would filter a grouped data frame. We thought that having @where perform an operation by group and then filtering rows of the parent data frame was more convenient behavior, makes code easier to reason about, and prevents unexpected edge cases. The change also makes @where more consistent with @select and @transform.
julia> using Statistics

julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);

julia> @where(groupby(df, :a), :b .> mean(:b))
2Γ—2 DataFrame
 Row β”‚ a      b     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1    100
   2 β”‚     2    200
  • @orderby on a GroupedDataFrame is now reserved, and will error. Similar to @where, above, the previous behavior re-ordered groups. This was a source of unexpected behavior and inconsistent with @select and @transform. However there wasn’t consensus on what it’s exact behavior on a GroupedDataFrame should be, and how to make it consistent with DataFrames.jl, it is reserved for future improvements.

  • @based_on is renamed to @combine to be more consistent with DataFrames.

julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);

julia> @combine(groupby(df, :a), b_max = maximum(:b))
2Γ—2 DataFrame
 Row β”‚ a      b_max 
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1    100
   2 β”‚     2    200
  • @transform with a GroupedDataFrame no longer re-orders rows, it’s behavior now matches that of DataFrames.transform.

  • You can now use cols on the LHS of an expression to work with column names programatically. As someone with lots of Stata experience I am particularly excited about this change.

julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);

julia> c_str = "c";

julia> @transform(df, cols(c_str) = :a .+ :b)
4Γ—3 DataFrame
 Row β”‚ a      b      c     
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      1      2
   2 β”‚     1    100    101
   3 β”‚     2      2      4
   4 β”‚     2    200    202
  • There may be some increase in latency due to the re-write of DataFramesMeta macros to use their corresponding DataFrames functions as backends. For example the call
julia> @transform(df, c = :a .+ :b)

lowers to

julia> transform(df, [:a, :b] => ((a, b) -> (a .+ b)) => :c)

which carries the compilation cost of both the anonymous function created as well as the cost of the transform infrastructure. Worry not! Both Julia 1.6 and DataFrames 0.22 seem to reduce this problem significantly, and we are actively exploring solutions.

I hope you enjoy the new developments!

Future priorities include

  • Allowing arbitrary expressions inside @transform rather than just those of the form y = f(:x). This will allow you to use the DataFrames transformation mini-language of src => fun => dest alongside y = f(:x) calls, like
julia> @transform(df, 
	z = :x .+ :y, 
	AsTable(Not(:q)) => myfun => :c)
  • Mutating macros, such as @transform! and @select!

  • Support for AsTable outputs in @transform

  • Support for keyword arguments in macros. For example, DataFrames.combine accepts the keyword argument ungroup. When ungroup is false, combine returns a grouped data frame. Supporting this requires more robust expression handling in the macro.

Enjoy!

13 Likes