DataFrames.jl 0.22.0 release is out. You can check out the detailed release notes here and the release here.
First let me thank people who worked on it. There are very many contributors, so here I list only those who contributed a merged PR between 0.21 and 0.22 releases (thee list is very long even with this filter): Alexey Stukalov, Arsh Sharma, Baurzhan Muftakhidinov, Bogumił Kamiński, Daniel Molina, David Nies, Jacob Quinn, Jonas Schulze, Kevin Bonham, Logan Kilpatrick, Matthieu Gomez, Milan Bouchet-Valat, Morten Piibeleht, Nicholas Ritchie, Nick Eubank, Nils Gudat, Okon Samuel, Paulito Palmes, Peter Deffebach, Peter Shintech, Ronan Arraes Jardim Chagas, Takafumi Arakaki, Tom Kwong, Tyler Beason, Wolf Thomsen, Zhuo Jia Dai.
The 0.22 release is intended to be the last release before 1.0 release and our intention is not to make breaking changes and make a 1.0 release relatively soon. Therefore you can safely assume that what works under 0.22 and is not deprecated (rembember about using --depwarn=error
in production code) will work long-term.
Also please keep in mind that display changes are not considered to be breaking.
The major changes in this release are (I am listing only breaking changes, as there are dozens of additions of functionalities — too many to list here):
- the package is precompiled aggresively (this means that it takes ~30 seconds when it is being installed to precompile), but “time to first data frame” will be reduced
- PrettyTables.jl is now the default back-end to print DataFrames to
text/plain
; the print option splitcols
was removed and the output format was changed
- the list of provided
DataFrame
constructors has been significantly restricted
- the rules for transformations passed to
select
/select!
, transform
/transform!
, and combine
have been made consistent and more flexible; in particular now it is allowed to return multiple columns from a transformation function
- The dependency on CategoricalArrays.jl is deprecated (which means that in 1.0 release we will completely drop this dependency; this should also help with latency in particular, though CategoricalArrays.jl got much better in this area recently)
- in joins passing
NaN
or real or imaginary -0.0
in on column now throws an error; passing missing
thows an error unless matchmissing=:equal
keyword argument is passed
-
unstack
now produces row and column keys in the order of their first appearance and has two new keyword arguments allowmissing
and allowduplicates
- in
describe
the specification of custom aggregation is now function => name
; old name => function
order is now deprecated
-
All(args...)
is deprecated, use Cols(args...)
instead (except that All()
is still allowed)
What is planned for the future (without guarantees what will make it into 1.0 release, as many of these things are hard and experimental; I am listing here only a limited number of thigs see issues/PRs in the package repository for a complete view):
- remove all deprecations
- improve join performance
- use multithreading in split-apply-combine
- add
proprow
specifier in transformations (like nrow
but calculating proportions)
- add
RowNumber
virtual source column in transformations
- add
AsVector
wrapper (like AsTable
but passing arguments as a vector to a function)
- add
where
function (like filter
but consistent with other transformation functions)
- more flexible
stack
/unstack
(in particular unstacking on mupultiple columns and multiple values)
Ecosystem changes:
- If you are maintaining a package that has DataFrames.jl as a dependency please update the Project.toml to allow 0.22 version
- I will update the tutorials soon (I will post when it is done, howver first some packages need to be updated to allow DataFrames.jl 0.22).
- It is also recommended to update the dependency on CategoricalArrays.jl to the 0.9 release of this package, as it significantly reduces number of introduced method invalidations.
I hope you will enjoy using new DataFrames.jl!