DataFrames.jl v0.19.0 has just been released. It is a major release towards DataFrames.jl 1.0 (we cannot get there yet as we have to go through deprecation cycle).
The number of changes is significant and includes:
API changes:
- allow
Regexindexing of columns - allow
Notfrom InvertedIndices.jl indexing of rows and columns - add
!indexing of rows ofAbstractDataFrame - deprecate indexing with column or columns only (like
df[:a]ordf[1:2]) - define target rules for
getindex,getproperty,setindex!, andsetproperty!forAbstractDataFrameandDataFrameRow` (in this release old behavior is deprecated; in the next release wit will get replaced by target functionality) - add indexing using
CartesianIndex{2}forAbstractDataFrame - full support of broadcasting for
AbstractDataFrame - support for broadcasting assignment for
DataFrameRow -
keys(::DataFrameRow)now returns aTupleof column names - added
getandmapmethods forDataFrameRow -
categorical!now accepts columns that containmissingvalues -
getandhaskeyforAbstractDataFrameis deprecated now -
empty!forDataFrameis deprecated now - add
haspropertyforAbstractDataFrame
Fixes:
- improved showind
DataFrameRowwith zero columns - fix
combinewith aggregation whenskipmissing=true
Minor changes:
- improvements in error messages and types of thrown exceptions on error
- various documentation improvements
- improved
getindexspeed for vector ofBoolindexing - remove InteractiveUtils.jl dependency
The major change is change of indexing rules and full support for broadcasting. Here are the details. In general in the design there was a tension between: ease of use, flexibility, safety and consistency.
Here are the major highlights:
- you can use
NotandRegexfor column indexing -
df[col]is nowdf[!, col]and gets/replaces a column in a data frame “as is” -
df[:, col]will always get a copy of a column/set a column in place -
df[cols]is nowdf[!, cols]and gets a new data frame without copying of columns -
df[:, cols]and gets a new data frame with copying of columns -
df.colis the same asdf[!, col]for consistency with Base indicating that it gives you “as is” access to the property of the data frame (i.e. it gives you the column without copying and replaces the column) - data frames can take part in broadcasting
- You can perform broadcasting assignment to
AbstractDataFrameandDataFrameRow; as a special rule: usingdf[!, col]syntax you can create a new column/replace old one using broadcasting (something which is non standard in regular broadcasting which is always in-place).
In summary ! indicates “an unsafe” operation. The reason is that people often were tricked by getting columns of a data frame, mutating them (e.g. resizing or sorting), and in consequence corrupting the source data frame. Now we hope that ! will serve them as a warning that this is not a safe operation (as opposed to : indexing which always makes a copy).
Here are the new rules at work:
julia> df = DataFrame(x1=1:3, x2=2:4, y='a':'c')
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 1 │ 2 │ 'a' │
│ 2 │ 2 │ 3 │ 'b' │
│ 3 │ 3 │ 4 │ 'c' │
julia> select(df, r"x")
3×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 4 │
julia> select(df, Not(r"x"))
3×1 DataFrame
│ Row │ y │
│ │ Char │
├─────┼──────┤
│ 1 │ 'a' │
│ 2 │ 'b' │
│ 3 │ 'c' │
julia> df[Not(1), Not(1)]
2×2 DataFrame
│ Row │ x2 │ y │
│ │ Int64 │ Char │
├─────┼───────┼──────┤
│ 1 │ 3 │ 'b' │
│ 2 │ 4 │ 'c' │
julia> df .+ 1
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 2 │ 3 │ 'b' │
│ 2 │ 3 │ 4 │ 'c' │
│ 3 │ 4 │ 5 │ 'd' │
julia> df .+= ones(Int, size(df))
3×3 DataFrame
│ Row │ x1 │ x2 │ y │
│ │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1 │ 2 │ 3 │ 'b' │
│ 2 │ 3 │ 4 │ 'c' │
│ 3 │ 4 │ 5 │ 'd' │
julia> df[!, :z] .= 1
3-element Array{Int64,1}:
1
1
1
julia> df
3×4 DataFrame
│ Row │ x1 │ x2 │ y │ z │
│ │ Int64 │ Int64 │ Char │ Int64 │
├─────┼───────┼───────┼──────┼───────┤
│ 1 │ 2 │ 3 │ 'b' │ 1 │
│ 2 │ 3 │ 4 │ 'c' │ 1 │
│ 3 │ 4 │ 5 │ 'd' │ 1 │