DataFrames.jl 0.22.0 release is out. You can check out the detailed release notes here and the release here.
First let me thank people who worked on it. There are very many contributors, so here I list only those who contributed a merged PR between 0.21 and 0.22 releases (thee list is very long even with this filter): Alexey Stukalov, Arsh Sharma, Baurzhan Muftakhidinov, Bogumił Kamiński, Daniel Molina, David Nies, Jacob Quinn, Jonas Schulze, Kevin Bonham, Logan Kilpatrick, Matthieu Gomez, Milan Bouchet-Valat, Morten Piibeleht, Nicholas Ritchie, Nick Eubank, Nils Gudat, Okon Samuel, Paulito Palmes, Peter Deffebach, Peter Shintech, Ronan Arraes Jardim Chagas, Takafumi Arakaki, Tom Kwong, Tyler Beason, Wolf Thomsen, Zhuo Jia Dai.
The 0.22 release is intended to be the last release before 1.0 release and our intention is not to make breaking changes and make a 1.0 release relatively soon. Therefore you can safely assume that what works under 0.22 and is not deprecated (rembember about using --depwarn=error in production code) will work long-term.
Also please keep in mind that display changes are not considered to be breaking.
The major changes in this release are (I am listing only breaking changes, as there are dozens of additions of functionalities — too many to list here):
- the package is precompiled aggresively (this means that it takes ~30 seconds when it is being installed to precompile), but “time to first data frame” will be reduced
- PrettyTables.jl is now the default back-end to print DataFrames to
text/plain; the print optionsplitcolswas removed and the output format was changed - the list of provided
DataFrameconstructors has been significantly restricted - the rules for transformations passed to
select/select!,transform/transform!, andcombinehave been made consistent and more flexible; in particular now it is allowed to return multiple columns from a transformation function - The dependency on CategoricalArrays.jl is deprecated (which means that in 1.0 release we will completely drop this dependency; this should also help with latency in particular, though CategoricalArrays.jl got much better in this area recently)
- in joins passing
NaNor real or imaginary-0.0in on column now throws an error; passingmissingthows an error unlessmatchmissing=:equalkeyword argument is passed -
unstacknow produces row and column keys in the order of their first appearance and has two new keyword argumentsallowmissingandallowduplicates - in
describethe specification of custom aggregation is nowfunction => name; oldname => functionorder is now deprecated -
All(args...)is deprecated, useCols(args...)instead (except thatAll()is still allowed)
What is planned for the future (without guarantees what will make it into 1.0 release, as many of these things are hard and experimental; I am listing here only a limited number of thigs see issues/PRs in the package repository for a complete view):
- remove all deprecations
- improve join performance
- use multithreading in split-apply-combine
- add
proprowspecifier in transformations (likenrowbut calculating proportions) - add
RowNumbervirtual source column in transformations - add
AsVectorwrapper (likeAsTablebut passing arguments as a vector to a function) - add
wherefunction (likefilterbut consistent with other transformation functions) - more flexible
stack/unstack(in particular unstacking on mupultiple columns and multiple values)
Ecosystem changes:
- If you are maintaining a package that has DataFrames.jl as a dependency please update the Project.toml to allow 0.22 version
- I will update the tutorials soon (I will post when it is done, howver first some packages need to be updated to allow DataFrames.jl 0.22).
- It is also recommended to update the dependency on CategoricalArrays.jl to the 0.9 release of this package, as it significantly reduces number of introduced method invalidations.
I hope you will enjoy using new DataFrames.jl!