DataFramesMeta.jl and the state of the DataFrames ecosystem

jmboehm · March 20, 2020, 2:14pm

I’ve found the existing macros/functions somewhat cumbersome for my purposes – or perhaps for my habits, which come from years of using Stata. Two things in particular:

The split-apply-combine operations automatically aggregate DataFrames to the group level when the function returns scalars. That does not work well with an approach where you first construct all variables at a more disaggregate level, and then aggregate up. In my experience, this approach is less prone to human errors than the alternative, which is to construct aggregates using by operations, and perhaps having to join some of them together.
Making split-apply-combine operations work in the way you would like to when the data contain missings is non-trivial. It usually involves a mixture of skipmissings and completecases, and by the time I’ve solved it, I long for Stata, which uses only cases where all variables are nonmissing. Of course that’s not always what you want, but it’s pretty straightforward to get what you want.

Again, I do not want to put the blame on the design of the split-apply-combine macros in these packages, it’s just that I seem to be very slow when I use them.

So my goal is to implement something like Stata’s syntax for the basic data wrangling operations, to help me solve my two-language problem (data cleaning in Stata, model estimation in Julia). Julia’s metaprogramming makes that relatively easy, and it’s also a good exercise for myself. So far I’ve been mostly using @with from DataFramesMeta (beyond the functions in DataFrames.jl) and I would be happy if this were to be continued to be supported (or possibly implemented within DataFrames).

Topic		Replies	Views
DataFrames.jl - Choosing between the core functions and available libraries (Query.jl, DataFramesMeta.jl, etc) Data	10	2066	September 15, 2018
Packages for DataFrame manipulation/query Data data , dataframes	5	1023	July 4, 2018
New user: DataFrames or TypedTable Data question , dataframes , tables	8	945	August 29, 2021
DataFrames.jl development survey Data question , dataframes	52	2942	September 27, 2020
Future directions for DataFrames.jl Data package , dataframes	47	6516	June 3, 2022

DataFramesMeta.jl and the state of the DataFrames ecosystem

Related topics