Release announcements of DataFramesMeta
Hello everyone! I am please to announce that yesterday DataFramesMeta had it’s
0.6.0 release, a breaking change from the previous version of
0.5.1. This post describes the new release. This thread will sevre as a place where we announce future releases as well.
@byrow!is deprecated in favor of
@eachrow. This was done for two reasons. First,
@byrow!is a bad name because it actually returns a fresh data frame rather than modifying the input. Second, we would like to leave the
@byrowopen to mirror DataFrames’s
ByRowfunction wrapper in a future release. Usage is
julia> using DataFramesMeta julia> df = DataFrame(a = [1, 2]; b = [3, 4]); julia> @eachrow df begin :a = :b * 100 end 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 300 3 2 │ 400 4
GroupedDataFramenow selects rows, not groups. Previously,
@wherewould filter a grouped data frame. We thought that having
@whereperform an operation by group and then filtering rows of the parent data frame was more convenient behavior, makes code easier to reason about, and prevents unexpected edge cases. The change also makes
@wheremore consistent with
julia> using Statistics julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]); julia> @where(groupby(df, :a), :b .> mean(:b)) 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 100 2 │ 2 200
GroupedDataFrameis now reserved, and will error. Similar to
@where, above, the previous behavior re-ordered groups. This was a source of unexpected behavior and inconsistent with
@transform. However there wasn’t consensus on what it’s exact behavior on a
GroupedDataFrameshould be, and how to make it consistent with DataFrames.jl, it is reserved for future improvements.
@based_onis renamed to
@combineto be more consistent with DataFrames.
julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]); julia> @combine(groupby(df, :a), b_max = maximum(:b)) 2×2 DataFrame Row │ a b_max │ Int64 Int64 ─────┼────────────── 1 │ 1 100 2 │ 2 200
GroupedDataFrameno longer re-orders rows, it’s behavior now matches that of
You can now use
colson the LHS of an expression to work with column names programatically. As someone with lots of Stata experience I am particularly excited about this change.
julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]); julia> c_str = "c"; julia> @transform(df, cols(c_str) = :a .+ :b) 4×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 1 2 2 │ 1 100 101 3 │ 2 2 4 4 │ 2 200 202
- There may be some increase in latency due to the re-write of DataFramesMeta macros to use their corresponding DataFrames functions as backends. For example the call
julia> @transform(df, c = :a .+ :b)
julia> transform(df, [:a, :b] => ((a, b) -> (a .+ b)) => :c)
which carries the compilation cost of both the anonymous function created as well as the cost of the
transform infrastructure. Worry not! Both Julia 1.6 and DataFrames 0.22 seem to reduce this problem significantly, and we are actively exploring solutions.
I hope you enjoy the new developments!
Future priorities include
- Allowing arbitrary expressions inside
@transformrather than just those of the form
y = f(:x). This will allow you to use the DataFrames transformation mini-language of
src => fun => destalongside
y = f(:x)calls, like
julia> @transform(df, z = :x .+ :y, AsTable(Not(:q)) => myfun => :c)
Mutating macros, such as
Support for keyword arguments in macros. For example,
DataFrames.combineaccepts the keyword argument
combinereturns a grouped data frame. Supporting this requires more robust expression handling in the macro.