DataFrames.jl v0.19.0 has just been released. It is a major release towards DataFrames.jl 1.0 (we cannot get there yet as we have to go through deprecation cycle).
The number of changes is significant and includes:
API changes:
- allow
Regex
indexing of columns - allow
Not
from InvertedIndices.jl indexing of rows and columns - add
!
indexing of rows ofAbstractDataFrame
- deprecate indexing with column or columns only (like
df[:a]
ordf[1:2]
) - define target rules for
getindex
,getproperty,
setindex!, and
setproperty!for
AbstractDataFrameand
DataFrameRow` (in this release old behavior is deprecated; in the next release wit will get replaced by target functionality) - add indexing using
CartesianIndex{2}
forAbstractDataFrame
- full support of broadcasting for
AbstractDataFrame
- support for broadcasting assignment for
DataFrameRow
-
keys(::DataFrameRow)
now returns aTuple
of column names - added
get
andmap
methods forDataFrameRow
-
categorical!
now accepts columns that containmissing
values -
get
andhaskey
forAbstractDataFrame
is deprecated now -
empty!
forDataFrame
is deprecated now - add
hasproperty
forAbstractDataFrame
Fixes:
- improved showind
DataFrameRow
with zero columns - fix
combine
with aggregation whenskipmissing=true
Minor changes:
- improvements in error messages and types of thrown exceptions on error
- various documentation improvements
- improved
getindex
speed for vector ofBool
indexing - remove InteractiveUtils.jl dependency
The major change is change of indexing rules and full support for broadcasting. Here are the details. In general in the design there was a tension between: ease of use, flexibility, safety and consistency.
Here are the major highlights:
- you can use
Not
andRegex
for column indexing -
df[col]
is nowdf[!, col]
and gets/replaces a column in a data frame “as is” -
df[:, col]
will always get a copy of a column/set a column in place -
df[cols]
is nowdf[!, cols]
and gets a new data frame without copying of columns -
df[:, cols]
and gets a new data frame with copying of columns -
df.col
is the same asdf[!, col]
for consistency with Base indicating that it gives you “as is” access to the property of the data frame (i.e. it gives you the column without copying and replaces the column) - data frames can take part in broadcasting
- You can perform broadcasting assignment to
AbstractDataFrame
andDataFrameRow
; as a special rule: usingdf[!, col]
syntax you can create a new column/replace old one using broadcasting (something which is non standard in regular broadcasting which is always in-place).
In summary !
indicates “an unsafe” operation. The reason is that people often were tricked by getting columns of a data frame, mutating them (e.g. resizing or sorting), and in consequence corrupting the source data frame. Now we hope that !
will serve them as a warning that this is not a safe operation (as opposed to :
indexing which always makes a copy).
Here are the new rules at work:
julia> df = DataFrame(x1=1:3, x2=2:4, y='a':'c')
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 1 │ 2 │ 'a' │
│ 2 │ 2 │ 3 │ 'b' │
│ 3 │ 3 │ 4 │ 'c' │
julia> select(df, r"x")
3×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 4 │
julia> select(df, Not(r"x"))
3×1 DataFrame
│ Row │ y │
│ │ Char │
├─────┼──────┤
│ 1 │ 'a' │
│ 2 │ 'b' │
│ 3 │ 'c' │
julia> df[Not(1), Not(1)]
2×2 DataFrame
│ Row │ x2 │ y │
│ │ Int64 │ Char │
├─────┼───────┼──────┤
│ 1 │ 3 │ 'b' │
│ 2 │ 4 │ 'c' │
julia> df .+ 1
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 2 │ 3 │ 'b' │
│ 2 │ 3 │ 4 │ 'c' │
│ 3 │ 4 │ 5 │ 'd' │
julia> df .+= ones(Int, size(df))
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 2 │ 3 │ 'b' │
│ 2 │ 3 │ 4 │ 'c' │
│ 3 │ 4 │ 5 │ 'd' │
julia> df[!, :z] .= 1
3-element Array{Int64,1}:
1
1
1
julia> df
3×4 DataFrame
│ Row │ x1 │ x2 │ y │ z │
│ │ Int64 │ Int64 │ Char │ Int64 │
├─────┼───────┼───────┼──────┼───────┤
│ 1 │ 2 │ 3 │ 'b' │ 1 │
│ 2 │ 3 │ 4 │ 'c' │ 1 │
│ 3 │ 4 │ 5 │ 'd' │ 1 │