DataFramesMeta 0.15.0 Announcement
I’m happy to announce the version 0.15.0 release of DataFramesmeta.jl! We’ve added three new features that users
wil like.
- Multi-column selection in
@select
- The
@groupby
macro for easier grouping syntax - Column label and note creation with
@label!
and@note!
Multi-column selection.
using DataFramesMeta, CSV, Statistics, Downloads
First, let’s download the starwars dataset
url = "https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv"
starwars = CSV.read(Downloads.download(url), DataFrame; missingstring = "NA")
10×14 DataFrame
Row │ name height mass hair_color skin_color ⋯
│ String31 Int64? Float64? String15? String31 ⋯
─────┼─────────────────────────────────────────────────────────────────────
1 │ Luke Skywalker 172 77.0 blond fair ⋯
2 │ C-3PO 167 75.0 missing gold
3 │ R2-D2 96 32.0 missing white, blue
4 │ Darth Vader 202 136.0 none white
5 │ Leia Organa 150 49.0 brown light ⋯
6 │ Owen Lars 178 120.0 brown, grey light
7 │ Beru Whitesun Lars 165 75.0 brown light
8 │ R5-D4 97 32.0 missing white, red
9 │ Biggs Darklighter 183 84.0 black light ⋯
10 │ Obi-Wan Kenobi 182 77.0 auburn, white fair
Select the :name
column
@select starwars :name
Select beween columns :name
and :mass
@select starwars Between(:name, :mass)
Select names starting with the letter “h”
(Some knowledge of Regular Expressions is required)
@select starwars Cols(r"^h")
Select names starting with “h” OR starting with “n”
@select starwars Cols(r"^h", r"^n")
Select names starting with “h” AND ending with “t”
@select starwars Cols(r"^h", r"t$"; operator = intersect)
Select all numeric columns (requires escaping). (This was possible before, but I’m showing it for completeness).
@select starwars $(names(starwars, Union{Real, Missing}))
@groupby
macro
The @groupby
macro is a thin wrapper around DataFrames.jl’s
groupby. It simply provides a way to avoid writing parentheses and
brackets.
(It’s also nicer to see the @
in a block of transformations.)
df = DataFrame(a = [1, 1, 2, 2], b = [100, 200, 50, 50]);
@chain df begin
@groupby :a
@transform :mean_b = mean(:b)
end
@chain df begin
@groupby :a :b
@transform :ngroup = length($1)
end
Metadata
DataFrames.jl, in conjunction with DataAPI.jl and TableMetaDataTools.jl,
implements metadata for information attached to a DataFrame
.
DataFramesMeta.jl provides an opinionated format for adding labels
and notes to data frames.
They are thin wrappers around TableMetaDataTools.jl’s label!
and
note!
features.
Add labels with @label!
df = DataFrame(wage = [16, 25, 14, 23]);
@label! df :wage = "Wage (2015 USD)"
Add notes with @note!
@note! df begin
:wage = "Hourly wage from 2015 American Community Survey (ACS)"
:wage = "Missing values have been dropped"
end
DataFramesMeta.jl also provides printlabels
and printnotes
for pretty-printing of metadata.
A list of all variables, showing the labels attached to them
printlabels(df)
┌────────┬─────────────────┐
│ Column │ Label │
├────────┼─────────────────┤
│ wage │ Wage (2015 USD) │
└────────┴─────────────────┘
Printing all notes, also printing labels
printnotes(df)
Column: wage
────────────
Label: Wage (2015 USD)
Hourly wage from 2015 American Community Survey (ACS)
Missing values have been dropped
Pro-tip: use TerminalPager.jl’s @stdout_to_pager
macro to print the notes of a data frame. Then you can search all the column names, labels, and notes in your data set at once.