[ANN] DataFramesMeta 0.7.0 release

I am happy to announce a new release of DataFramesMeta. This new release contains three important additions

  1. @byrow. Being able to better work perform transformations by-row, rather than using broadcasting, has been a long requested feature for DataFramesMeta. This release introduces @byrow, a macro-like syntax used inside DataFramesMeta macros.
julia> using DataFramesMeta

julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> @transform df @byrow c = :a == 1 ? 100 : 200
3Γ—3 DataFrame
 Row β”‚ a      b      c     
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     1      4    100
   2 β”‚     2      5    200
   3 β”‚     3      6    200

It can be used inside @transform, @select, @where, @orderby, and @combine (though it’s not very useful in @combine.

It can also be used in @with, where it’s roughly equivalent to map.

julia> @with df @byrow :a * :b
3-element Vector{Int64}:
  4
 10
 18
  1. @eachrow! an in-place version of @eachrow. The key benefit of @eachrow is that it creates a fast iterator through rows of a data frame, especially since for row in eachrow(df) is slow in Base DataFrames.

Unfortunately, @eachrow always returns a new data frame, nullifying the speed of the implementation. This fixes that.

julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> @eachrow! df begin 
           :a = :b * 100
       end
3Γ—2 DataFrame
 Row β”‚ a      b     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚   400      4
   2 β”‚   500      5
   3 β”‚   600      6

julia> df
3Γ—2 DataFrame
 Row β”‚ a      b     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚   400      4
   2 β”‚   500      5
   3 β”‚   600      6
  1. Making many operations in a block. In implementing @byrow as a macro-flag, we realized that due to Julia’s parsing, @transform(df, @byrow y = f(:x), @byrow z = g(:x)) wouldn’t work without the addition of more parentheses. So we needed a new syntax to be able to use macro-flags (like @byrow and future additions). The solution was to allow multiple operations in a block.
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> @transform df begin 
           c = :a .+ 100
           d = :a .* :b
       end
3Γ—4 DataFrame
 Row β”‚ a      b      c      d     
     β”‚ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 β”‚     1      4    101      4
   2 β”‚     2      5    102     10
   3 β”‚     3      6    103     18

For people who perform multiple transformations by-row, we allow @byrow at the top of the block to signal that all transformations are applied by-row.

julia> @transform df @byrow begin 
           c = "Person $(:a)"
           d = :a * :b
       end
3Γ—4 DataFrame
 Row β”‚ a      b      c         d     
     β”‚ Int64  Int64  String    Int64 
─────┼───────────────────────────────
   1 β”‚     1      4  Person 1      4
   2 β”‚     2      5  Person 2     10
   3 β”‚     3      6  Person 3     18

Why doesn’t DataFramesMeta.jl make row transformations the default?

The improvements in this release of DataFramesMeta center on making it easier to work with a dataframe by-row. So why not make this the default? Ultimately, DataFramesMeta’s goal is to provide an easier syntax for working with DataFrames’ source => fun => dest mini-language. Because DataFrames.transform, DataFrames.select act on the whole column, making operations by-row by default may make it difficult for users to switch between the two syntaxes. However I hope to continue making it easier and easier for people to work with DataFrames as they like.

Future improvements

In the pipeline for the future are

  • making @subset and @subset!, and deprecating @where to improve consistency with Base DataFrames.
  • Allow for multi-argument selectors in @select (i.e. Between, Not, etc)
  • Adding more convenience macro flags, such as @passmissing and @missingfalse to make working with missing values more convenient
  • Quality of life improvements, such as using :x on the LHS of expressions, as in the recently released DFMacros.jl

Please file issues if you encounter bugs and to propose new features!

6 Likes