I am happy to announce a new release of DataFramesMeta. This new release contains three important additions
-
@byrow. Being able to better work perform transformations by-row, rather than using broadcasting, has been a long requested feature for DataFramesMeta. This release introduces@byrow, a macro-like syntax used inside DataFramesMeta macros.
julia> using DataFramesMeta
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @transform df @byrow c = :a == 1 ? 100 : 200
3Γ3 DataFrame
Row β a b c
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 4 100
2 β 2 5 200
3 β 3 6 200
It can be used inside @transform, @select, @where, @orderby, and @combine (though itβs not very useful in @combine.
It can also be used in @with, where itβs roughly equivalent to map.
julia> @with df @byrow :a * :b
3-element Vector{Int64}:
4
10
18
-
@eachrow!an in-place version of@eachrow. The key benefit of@eachrowis that it creates a fast iterator through rows of a data frame, especially sincefor row in eachrow(df)is slow in Base DataFrames.
Unfortunately, @eachrow always returns a new data frame, nullifying the speed of the implementation. This fixes that.
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @eachrow! df begin
:a = :b * 100
end
3Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 400 4
2 β 500 5
3 β 600 6
julia> df
3Γ2 DataFrame
Row β a b
β Int64 Int64
ββββββΌββββββββββββββ
1 β 400 4
2 β 500 5
3 β 600 6
- Making many operations in a block. In implementing
@byrowas a macro-flag, we realized that due to Juliaβs parsing,@transform(df, @byrow y = f(:x), @byrow z = g(:x))wouldnβt work without the addition of more parentheses. So we needed a new syntax to be able to use macro-flags (like@byrowand future additions). The solution was to allow multiple operations in a block.
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> @transform df begin
c = :a .+ 100
d = :a .* :b
end
3Γ4 DataFrame
Row β a b c d
β Int64 Int64 Int64 Int64
ββββββΌββββββββββββββββββββββββββββ
1 β 1 4 101 4
2 β 2 5 102 10
3 β 3 6 103 18
For people who perform multiple transformations by-row, we allow @byrow at the top of the block to signal that all transformations are applied by-row.
julia> @transform df @byrow begin
c = "Person $(:a)"
d = :a * :b
end
3Γ4 DataFrame
Row β a b c d
β Int64 Int64 String Int64
ββββββΌβββββββββββββββββββββββββββββββ
1 β 1 4 Person 1 4
2 β 2 5 Person 2 10
3 β 3 6 Person 3 18
Why doesnβt DataFramesMeta.jl make row transformations the default?
The improvements in this release of DataFramesMeta center on making it easier to work with a dataframe by-row. So why not make this the default? Ultimately, DataFramesMetaβs goal is to provide an easier syntax for working with DataFramesβ source => fun => dest mini-language. Because DataFrames.transform, DataFrames.select act on the whole column, making operations by-row by default may make it difficult for users to switch between the two syntaxes. However I hope to continue making it easier and easier for people to work with DataFrames as they like.
Future improvements
In the pipeline for the future are
- making
@subsetand@subset!, and deprecating@whereto improve consistency with Base DataFrames. - Allow for multi-argument selectors in
@select(i.e.Between,Not, etc) - Adding more convenience macro flags, such as
@passmissingand@missingfalseto make working withmissingvalues more convenient - Quality of life improvements, such as using
:xon the LHS of expressions, as in the recently released DFMacros.jl
Please file issues if you encounter bugs and to propose new features!