I made some breaking changes to DataFrameMacros, which hopefully make the package easier to use and more powerful.
-
-
Breaking: The
$()
interpolation syntax is replaced by{}
for single columns (or broadcasted multi-columns)
-
Breaking: The
-
- Added
{{}}
for referring to multiple columns as a tuple.
- Added
-
-
Breaking: No more flag macros, replaced by explicit
@byrow
,@bycol
,@passmissing
,@astable
which is also what DataFramesMeta uses and which I was finally convinced is better (less confusing) than the single-character macros I liked at first for their brevity.
-
Breaking: No more flag macros, replaced by explicit
-
-
Breaking:
All()
,Between()
andNot()
have to be interpolated with{}
and canβt be used standalone anymore.
-
Breaking:
An example of the new {{}}
syntax is below. I was annoyed for a long time that it wasnβt as easy to refer to multiple columns together in one expression to aggregate over them and compare with some other columns at the same time. I already had implicit broadcasting for {}
but that only makes it easier to run the same expression on multiple columns each, not multiple columns together. The {{}}
syntax is replaced with a tuple of the specified columns, so you can run aggregations on that. Because of the tuple-ization itβs probably not useful for very large numbers of columns (too much compilation overhead I assume) but for normal workloads it should be fine.
julia> df = DataFrame(
jan = randn(5),
feb = randn(5),
mar = randn(5),
apr = randn(5),
may = randn(5),
jun = randn(5),
jul = randn(5),
)
5Γ7 DataFrame
Row β jan feb mar apr may jun jul β―
β Float64 Float64 Float64 Float64 Float64 Float64 Floa β―
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1.19027 -0.664713 -0.339366 0.368002 -0.979539 1.52392 -0.8 β―
2 β 2.04818 0.980968 -0.843878 -0.281133 0.260402 -1.77773 0.3
3 β 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 -2.93306 -0.1
4 β 0.459416 0.273815 0.327215 -0.71741 -0.880897 0.782258 2.3
5 β -0.396679 -0.194229 0.592403 -0.77507 0.277726 2.31358 -0.9 β―
1 column omitted
julia> @select(df, :july_larger = :jul > median({{Between(:jan, :jun)}}))
5Γ1 DataFrame
Row β july_larger
β Bool
ββββββΌβββββββββββββ
1 β false
2 β true
3 β true
4 β true
5 β false
julia> @select(df, :mean_smaller = mean({{All()}}) < median({{All()}}))
5Γ1 DataFrame
Row β mean_smaller
β Bool
ββββββΌββββββββββββββ
1 β false
2 β true
3 β true
4 β false
5 β false