Release announcements for DataFrames.jl

Whe have DataFrames.jl release 0.21. This is a very big release with 102 PRs merged. Thanks to all who worked on it (the issues and the PRs). Due to a large number of contributors I list here only the people who opened a merged PR since 0.20 release: anandijain, DilumAluthge, jlumpe, jonas-schulze, nalimilan, nickeubank, non-Jedi, omus, oxinabox, pdeffebach, pearlzli, prosoitos, quinnj, ssikdar1, tkf, vonDonnerstein (I had to remove @ as Discourse disallows mentioning so many users in a single post :smile:).

The detailed release notes (with all issues and PRs closed) is here: Release v0.21.0 · JuliaData/DataFrames.jl · GitHub.

Here are the main highlights:

Breaking:

  • complete redesign of select, select!, transform, transform! and combine (now we roughly match dplyr functionality in a single consistent system; the list of changes is too long to list them here - please read the docstrings of select and combine)
  • deprecate by, map and aggregate
  • deprecate join in favor of innerjoin, outerjoin, etc.
  • columns can be indexed using strings, all functions are updated accordingly
  • all types consistenly support names which produces Vector{String} and propertynames which produces Vector{Symbol}
  • Tables.rows iterates DataFrameRows to avoid compilation for very wide tables
  • remove lastingex without a dimension
  • deprecate names=true in eachcol
  • change ArgumentError to DimensionMismatch in several methods (where it was more suitable)
  • give ErrorException when trying to iterate AbstractDataFrame
  • change to ? when showing a DataFrame and type display improvements
  • make id_vars go first in stack
  • add groupcols and valuecols functions; deprecate groupvars
  • deprecate passing tuple of columns to sort
  • rename deleterows! to delete!
  • change eltype of NamedTuple from DataFrameRow

New features:

  • allow :union as cols kwarg in push! and append!; also allow autopromotion of column eltypes
  • DataFrameRows and DataFrameColumns support Tables.jl interface
  • names allows column selector as a second positional argument
  • variable_eltype kwarg added to stack
  • improve performance of unstack
  • add convert and merge to DataFrameRow
  • define summary for GroupedDataFrame
  • returning an empty table in combine drops a group
  • insertcols! now allows passing multiple columns
  • improve indexing of GroupedDataFrame with keys; make such lookup fast (in consequence DataFrames.jl now provides a fast lookup!)
  • define consistent rules of pseudo-broadcasting in DataFrames.jl (in particular unwrap Ref and 0-dimensional arrays)
  • re-export Tables.jl
  • allow Pair argument in filter and filter!
  • improve flatten
  • add haskey to GroupedDataFrame and GroupKey
  • add eltypes kwag to show
  • add mapcols! and repeat!, fix corner cases of repeat

Bugfixes:

  • fix grouped maximum, minimum, var and std with only missing values
  • fix combine when different functions return groups of different lengths
  • fix combine when DataFrameRow was returned
  • fix the groups field values when GropuedDataFrame is returned by combine (previously map)
  • respect IOContext of io when printing
  • fix eltype in stack with view=true
  • fix circular ref bug in show; improve showing of special types

Other:

  • many documentation improvements
  • improve organization of codebase
  • fix BoundsError messages
  • remove readtable and writetable from deprecated
  • update up to Julia 1.5 nightly

The plans for the future are the following. Ideally the next release is 1.0 and we do not include any breaking changes (the reality might turn out to be different though).

What are key objectives to do after 0.21 release till 1.0 release:

  • documentation improvements
  • decouple DataFramesBase.jl as a lightweight low-level API package
  • adding requested non-breaking functionality
  • find as many bugs as possible before 1.0 release

If this goes as planned we shall make 1.0 release in 3 to 6 months from now (depending how the things progress and the user feedback).

I will also update https://github.com/bkamins/Julia-DataFrames-Tutorial soon (we need other packages to sync with DataFrames.jl release 0.21 before this). I will post when this is done.

40 Likes