Whe have DataFrames.jl release 0.21. This is a very big release with 102 PRs merged. Thanks to all who worked on it (the issues and the PRs). Due to a large number of contributors I list here only the people who opened a merged PR since 0.20 release: anandijain, DilumAluthge, jlumpe, jonas-schulze, nalimilan, nickeubank, non-Jedi, omus, oxinabox, pdeffebach, pearlzli, prosoitos, quinnj, ssikdar1, tkf, vonDonnerstein (I had to remove @ as Discourse disallows mentioning so many users in a single post ).
The detailed release notes (with all issues and PRs closed) is here: Release v0.21.0 · JuliaData/DataFrames.jl · GitHub.
Here are the main highlights:
Breaking:
- complete redesign of
select
,select!
,transform
,transform!
andcombine
(now we roughly match dplyr functionality in a single consistent system; the list of changes is too long to list them here - please read the docstrings ofselect
andcombine
) - deprecate
by
,map
andaggregate
- deprecate
join
in favor ofinnerjoin
,outerjoin
, etc. - columns can be indexed using strings, all functions are updated accordingly
- all types consistenly support
names
which producesVector{String}
andpropertynames
which producesVector{Symbol}
- Tables.rows iterates
DataFrameRows
to avoid compilation for very wide tables - remove
lastingex
without a dimension - deprecate
names=true
ineachcol
- change
ArgumentError
toDimensionMismatch
in several methods (where it was more suitable) - give
ErrorException
when trying to iterateAbstractDataFrame
- change
⍰
to?
when showing a DataFrame and type display improvements - make
id_vars
go first instack
- add
groupcols
andvaluecols
functions; deprecategroupvars
- deprecate passing tuple of columns to
sort
- rename
deleterows!
todelete!
- change
eltype
ofNamedTuple
fromDataFrameRow
New features:
- allow
:union
ascols
kwarg inpush!
andappend!
; also allow autopromotion of column eltypes -
DataFrameRows
andDataFrameColumns
support Tables.jl interface -
names
allows column selector as a second positional argument -
variable_eltype
kwarg added tostack
- improve performance of
unstack
- add
convert
andmerge
toDataFrameRow
- define
summary
forGroupedDataFrame
- returning an empty table in
combine
drops a group -
insertcols!
now allows passing multiple columns - improve indexing of
GroupedDataFrame
with keys; make such lookup fast (in consequence DataFrames.jl now provides a fast lookup!) - define consistent rules of pseudo-broadcasting in DataFrames.jl (in particular unwrap
Ref
and0
-dimensional arrays) - re-export Tables.jl
- allow
Pair
argument infilter
andfilter!
- improve
flatten
- add
haskey
toGroupedDataFrame
andGroupKey
- add
eltypes
kwag toshow
- add
mapcols!
andrepeat!
, fix corner cases ofrepeat
Bugfixes:
- fix grouped maximum, minimum, var and std with only missing values
- fix
combine
when different functions return groups of different lengths - fix
combine
whenDataFrameRow
was returned - fix the
groups
field values whenGropuedDataFrame
is returned bycombine
(previouslymap
) - respect
IOContext
ofio
when printing - fix eltype in
stack
withview=true
- fix circular ref bug in
show
; improve showing of special types
Other:
- many documentation improvements
- improve organization of codebase
- fix
BoundsError
messages - remove
readtable
andwritetable
from deprecated - update up to Julia 1.5 nightly
The plans for the future are the following. Ideally the next release is 1.0 and we do not include any breaking changes (the reality might turn out to be different though).
What are key objectives to do after 0.21 release till 1.0 release:
- documentation improvements
- decouple DataFramesBase.jl as a lightweight low-level API package
- adding requested non-breaking functionality
- find as many bugs as possible before 1.0 release
If this goes as planned we shall make 1.0 release in 3 to 6 months from now (depending how the things progress and the user feedback).
I will also update https://github.com/bkamins/Julia-DataFrames-Tutorial soon (we need other packages to sync with DataFrames.jl release 0.21 before this). I will post when this is done.