Release announcements of DataFramesMeta
Hello everyone! I am please to announce that yesterday DataFramesMeta had itβs 0.6.0
release, a breaking change from the previous version of 0.5.1
. This post describes the new release. This thread will sevre as a place where we announce future releases as well.
Breaking changes:
-
@byrow!
is deprecated in favor of@eachrow
. This was done for two reasons. First,@byrow!
is a bad name because it actually returns a fresh data frame rather than modifying the input. Second, we would like to leave the@byrow
open to mirror DataFramesβsByRow
function wrapper in a future release. Usage is
julia> using DataFramesMeta
julia> df = DataFrame(a = [1, 2]; b = [3, 4]);
julia> @eachrow df begin
:a = :b * 100
end
2Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 300 3
2 β 400 4
-
@where
with aGroupedDataFrame
now selects rows, not groups. Previously,@where
would filter a grouped data frame. We thought that having@where
perform an operation by group and then filtering rows of the parent data frame was more convenient behavior, makes code easier to reason about, and prevents unexpected edge cases. The change also makes@where
more consistent with@select
and@transform
.
julia> using Statistics
julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);
julia> @where(groupby(df, :a), :b .> mean(:b))
2Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 1 100
2 β 2 200
-
@orderby
on aGroupedDataFrame
is now reserved, and will error. Similar to@where
, above, the previous behavior re-ordered groups. This was a source of unexpected behavior and inconsistent with@select
and@transform
. However there wasnβt consensus on what itβs exact behavior on aGroupedDataFrame
should be, and how to make it consistent with DataFrames.jl, it is reserved for future improvements. -
@based_on
is renamed to@combine
to be more consistent with DataFrames.
julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);
julia> @combine(groupby(df, :a), b_max = maximum(:b))
2Γ2 DataFrame
Row β a b_max
β Int64 Int64
ββββββΌββββββββββββββ
1 β 1 100
2 β 2 200
-
@transform
with aGroupedDataFrame
no longer re-orders rows, itβs behavior now matches that ofDataFrames.transform
. -
You can now use
cols
on the LHS of an expression to work with column names programatically. As someone with lots of Stata experience I am particularly excited about this change.
julia> df = DataFrame(a = [1, 1, 2, 2],b = [1, 100, 2, 200]);
julia> c_str = "c";
julia> @transform(df, cols(c_str) = :a .+ :b)
4Γ3 DataFrame
Row β a b c
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 1 2
2 β 1 100 101
3 β 2 2 4
4 β 2 200 202
- There may be some increase in latency due to the re-write of DataFramesMeta macros to use their corresponding DataFrames functions as backends. For example the call
julia> @transform(df, c = :a .+ :b)
lowers to
julia> transform(df, [:a, :b] => ((a, b) -> (a .+ b)) => :c)
which carries the compilation cost of both the anonymous function created as well as the cost of the transform
infrastructure. Worry not! Both Julia 1.6 and DataFrames 0.22 seem to reduce this problem significantly, and we are actively exploring solutions.
I hope you enjoy the new developments!
Future priorities include
- Allowing arbitrary expressions inside
@transform
rather than just those of the formy = f(:x)
. This will allow you to use the DataFrames transformation mini-language ofsrc => fun => dest
alongsidey = f(:x)
calls, like
julia> @transform(df,
z = :x .+ :y,
AsTable(Not(:q)) => myfun => :c)
-
Mutating macros, such as
@transform!
and@select!
-
Support for
AsTable
outputs in@transform
-
Support for keyword arguments in macros. For example,
DataFrames.combine
accepts the keyword argumentungroup
. Whenungroup
isfalse
,combine
returns a grouped data frame. Supporting this requires more robust expression handling in the macro.
Enjoy!