Whe have DataFrames.jl release 0.21. This is a very big release with 102 PRs merged. Thanks to all who worked on it (the issues and the PRs). Due to a large number of contributors I list here only the people who opened a merged PR since 0.20 release: anandijain, DilumAluthge, jlumpe, jonas-schulze, nalimilan, nickeubank, non-Jedi, omus, oxinabox, pdeffebach, pearlzli, prosoitos, quinnj, ssikdar1, tkf, vonDonnerstein (I had to remove @ as Discourse disallows mentioning so many users in a single post
).
The detailed release notes (with all issues and PRs closed) is here: Release v0.21.0 · JuliaData/DataFrames.jl · GitHub.
Here are the main highlights:
Breaking:
- complete redesign of
select,select!,transform,transform!andcombine(now we roughly match dplyr functionality in a single consistent system; the list of changes is too long to list them here - please read the docstrings ofselectandcombine) - deprecate
by,mapandaggregate - deprecate
joinin favor ofinnerjoin,outerjoin, etc. - columns can be indexed using strings, all functions are updated accordingly
- all types consistenly support
nameswhich producesVector{String}andpropertynameswhich producesVector{Symbol} - Tables.rows iterates
DataFrameRowsto avoid compilation for very wide tables - remove
lastingexwithout a dimension - deprecate
names=trueineachcol - change
ArgumentErrortoDimensionMismatchin several methods (where it was more suitable) - give
ErrorExceptionwhen trying to iterateAbstractDataFrame - change
⍰to?when showing a DataFrame and type display improvements - make
id_varsgo first instack - add
groupcolsandvaluecolsfunctions; deprecategroupvars - deprecate passing tuple of columns to
sort - rename
deleterows!todelete! - change
eltypeofNamedTuplefromDataFrameRow
New features:
- allow
:unionascolskwarg inpush!andappend!; also allow autopromotion of column eltypes -
DataFrameRowsandDataFrameColumnssupport Tables.jl interface -
namesallows column selector as a second positional argument -
variable_eltypekwarg added tostack - improve performance of
unstack - add
convertandmergetoDataFrameRow - define
summaryforGroupedDataFrame - returning an empty table in
combinedrops a group -
insertcols!now allows passing multiple columns - improve indexing of
GroupedDataFramewith keys; make such lookup fast (in consequence DataFrames.jl now provides a fast lookup!) - define consistent rules of pseudo-broadcasting in DataFrames.jl (in particular unwrap
Refand0-dimensional arrays) - re-export Tables.jl
- allow
Pairargument infilterandfilter! - improve
flatten - add
haskeytoGroupedDataFrameandGroupKey - add
eltypeskwag toshow - add
mapcols!andrepeat!, fix corner cases ofrepeat
Bugfixes:
- fix grouped maximum, minimum, var and std with only missing values
- fix
combinewhen different functions return groups of different lengths - fix
combinewhenDataFrameRowwas returned - fix the
groupsfield values whenGropuedDataFrameis returned bycombine(previouslymap) - respect
IOContextofiowhen printing - fix eltype in
stackwithview=true - fix circular ref bug in
show; improve showing of special types
Other:
- many documentation improvements
- improve organization of codebase
- fix
BoundsErrormessages - remove
readtableandwritetablefrom deprecated - update up to Julia 1.5 nightly
The plans for the future are the following. Ideally the next release is 1.0 and we do not include any breaking changes (the reality might turn out to be different though).
What are key objectives to do after 0.21 release till 1.0 release:
- documentation improvements
- decouple DataFramesBase.jl as a lightweight low-level API package
- adding requested non-breaking functionality
- find as many bugs as possible before 1.0 release
If this goes as planned we shall make 1.0 release in 3 to 6 months from now (depending how the things progress and the user feedback).
I will also update https://github.com/bkamins/Julia-DataFrames-Tutorial soon (we need other packages to sync with DataFrames.jl release 0.21 before this). I will post when this is done.