I am happy to announce a new release of DataFramesMeta. This new release contains three important additions
-
@byrow
. Being able to better work perform transformations by-row, rather than using broadcasting, has been a long requested feature for DataFramesMeta. This release introduces@byrow
, a macro-like syntax used inside DataFramesMeta macros.
julia> using DataFramesMeta
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @transform df @byrow c = :a == 1 ? 100 : 200
3Γ3 DataFrame
Row β a b c
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 4 100
2 β 2 5 200
3 β 3 6 200
It can be used inside @transform
, @select
, @where
, @orderby
, and @combine
(though itβs not very useful in @combine
.
It can also be used in @with
, where itβs roughly equivalent to map
.
julia> @with df @byrow :a * :b
3-element Vector{Int64}:
4
10
18
-
@eachrow!
an in-place version of@eachrow
. The key benefit of@eachrow
is that it creates a fast iterator through rows of a data frame, especially sincefor row in eachrow(df)
is slow in Base DataFrames.
Unfortunately, @eachrow
always returns a new data frame, nullifying the speed of the implementation. This fixes that.
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @eachrow! df begin
:a = :b * 100
end
3Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 400 4
2 β 500 5
3 β 600 6
julia> df
3Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 400 4
2 β 500 5
3 β 600 6
- Making many operations in a block. In implementing
@byrow
as a macro-flag, we realized that due to Juliaβs parsing,@transform(df, @byrow y = f(:x), @byrow z = g(:x))
wouldnβt work without the addition of more parentheses. So we needed a new syntax to be able to use macro-flags (like@byrow
and future additions). The solution was to allow multiple operations in a block.
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @transform df begin
c = :a .+ 100
d = :a .* :b
end
3Γ4 DataFrame
Row β a b c d
β Int64 Int64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββ
1 β 1 4 101 4
2 β 2 5 102 10
3 β 3 6 103 18
For people who perform multiple transformations by-row, we allow @byrow
at the top of the block to signal that all transformations are applied by-row.
julia> @transform df @byrow begin
c = "Person $(:a)"
d = :a * :b
end
3Γ4 DataFrame
Row β a b c d
β Int64 Int64 String Int64
ββββββΌβββββββββββββββββββββββββββββββ
1 β 1 4 Person 1 4
2 β 2 5 Person 2 10
3 β 3 6 Person 3 18
Why doesnβt DataFramesMeta.jl make row transformations the default?
The improvements in this release of DataFramesMeta center on making it easier to work with a dataframe by-row. So why not make this the default? Ultimately, DataFramesMetaβs goal is to provide an easier syntax for working with DataFramesβ source => fun => dest
mini-language. Because DataFrames.transform
, DataFrames.select
act on the whole column, making operations by-row by default may make it difficult for users to switch between the two syntaxes. However I hope to continue making it easier and easier for people to work with DataFrames as they like.
Future improvements
In the pipeline for the future are
- making
@subset
and@subset!
, and deprecating@where
to improve consistency with Base DataFrames. - Allow for multi-argument selectors in
@select
(i.e.Between
,Not
, etc) - Adding more convenience macro flags, such as
@passmissing
and@missingfalse
to make working withmissing
values more convenient - Quality of life improvements, such as using
:x
on the LHS of expressions, as in the recently released DFMacros.jl
Please file issues if you encounter bugs and to propose new features!