DataFrames.jl 0.22.0 release is out. You can check out the detailed release notes here and the release here.
First let me thank people who worked on it. There are very many contributors, so here I list only those who contributed a merged PR between 0.21 and 0.22 releases (thee list is very long even with this filter): Alexey Stukalov, Arsh Sharma, Baurzhan Muftakhidinov, Bogumił Kamiński, Daniel Molina, David Nies, Jacob Quinn, Jonas Schulze, Kevin Bonham, Logan Kilpatrick, Matthieu Gomez, Milan Bouchet-Valat, Morten Piibeleht, Nicholas Ritchie, Nick Eubank, Nils Gudat, Okon Samuel, Paulito Palmes, Peter Deffebach, Peter Shintech, Ronan Arraes Jardim Chagas, Takafumi Arakaki, Tom Kwong, Tyler Beason, Wolf Thomsen, Zhuo Jia Dai.
The 0.22 release is intended to be the last release before 1.0 release and our intention is not to make breaking changes and make a 1.0 release relatively soon. Therefore you can safely assume that what works under 0.22 and is not deprecated (rembember about using --depwarn=error in production code) will work long-term.
Also please keep in mind that display changes are not considered to be breaking.
The major changes in this release are (I am listing only breaking changes, as there are dozens of additions of functionalities — too many to list here):
- the package is precompiled aggresively (this means that it takes ~30 seconds when it is being installed to precompile), but “time to first data frame” will be reduced
- PrettyTables.jl is now the default back-end to print DataFrames to
text/plain; the print option splitcols was removed and the output format was changed
- the list of provided
DataFrame constructors has been significantly restricted
- the rules for transformations passed to
select/select!, transform/transform!, and combine have been made consistent and more flexible; in particular now it is allowed to return multiple columns from a transformation function
- The dependency on CategoricalArrays.jl is deprecated (which means that in 1.0 release we will completely drop this dependency; this should also help with latency in particular, though CategoricalArrays.jl got much better in this area recently)
- in joins passing
NaN or real or imaginary -0.0 in on column now throws an error; passing missing thows an error unless matchmissing=:equal keyword argument is passed
-
unstack now produces row and column keys in the order of their first appearance and has two new keyword arguments allowmissing and allowduplicates
- in
describe the specification of custom aggregation is now function => name; old name => function order is now deprecated
-
All(args...) is deprecated, use Cols(args...) instead (except that All() is still allowed)
What is planned for the future (without guarantees what will make it into 1.0 release, as many of these things are hard and experimental; I am listing here only a limited number of thigs see issues/PRs in the package repository for a complete view):
- remove all deprecations
- improve join performance
- use multithreading in split-apply-combine
- add
proprow specifier in transformations (like nrow but calculating proportions)
- add
RowNumber virtual source column in transformations
- add
AsVector wrapper (like AsTable but passing arguments as a vector to a function)
- add
where function (like filter but consistent with other transformation functions)
- more flexible
stack/unstack (in particular unstacking on mupultiple columns and multiple values)
Ecosystem changes:
- If you are maintaining a package that has DataFrames.jl as a dependency please update the Project.toml to allow 0.22 version
- I will update the tutorials soon (I will post when it is done, howver first some packages need to be updated to allow DataFrames.jl 0.22).
- It is also recommended to update the dependency on CategoricalArrays.jl to the 0.9 release of this package, as it significantly reduces number of introduced method invalidations.
I hope you will enjoy using new DataFrames.jl!