Release announcements for DataFrames.jl

DataFrames.jl 1.3.0 is out.

It is a major release much bigger than recent releases. It is expected that, hopefully, we managed to fix all key missing parts in the package to make it feature complete.

Development towards 1.4.0 will continue by adding additional features requested by the users. I expect to have this release around JuliaCon 2022 (unless something unexpected happens).

Here you can find the detailed release notes. See also NEWS.md for a list of relevant changes in the package.

Let me briefly summarize the most important changes and additions (in total 125 PRs were merged since 1.2.2 release which is a lot) this will be brief so it assumes you know the functionality of the package, I will soon write a blog post explaining these changes for newcomers):

  • in groupby now users have more control on resulting group order (this resolves the issue previously groupby was implemented to produce the group ordering that is fastest to create by default, which is unintuitive in certain use cases; now sort keyword argument is improved and allows more control from the user if this is desired);
  • if SubDataFrame was created with : column selector (i.e. it contains all columns of its parent) then you can add new columns to such data frame in all functions (the filtered out rows get filled with missing value)
  • delete! is deprecated in favor of deleteat! fixing the inconsistency with how what these functions are used for in Julia Base
  • leftjoin! is added allowing for in-place joining of data frames (and it is fast)
  • in source .=> transformation .=> destination form of the transformation minilanguage the Cols, Between, All and Not selectors support broadcasting;
  • fix a bug in handling of keyword arguments in sorting related functions that in some cases allowed passing tuples (support of which was removed in 1.0 release) and in some other cases lead to stack overflow;
  • transformations having a form AsTable(...) => ByRow(sum) (and other standard reduction functions) are now fast even when many columns are selected (solving a long standing performance bottleneck)
  • In DataFrames.jl 1.4 release on Julia 1.7 or newer broadcasting assignment into an existing column of a data frame will replace it. Under Julia 1.6 or older it will be an in place operation. (this is an unfortunate difference in behavior between versions of Julia - it is impossible to implement it differently due to limitations of Julia Base; that is why a clear announcement of this discrepancy is made now and the change will be made effective in DataFrames.jl 1.4)

Before I wrap up let me thank everyone who contributed towards this release!

50 Likes