Summer Happenings for Tidier.jl
- Introducing TidierIteration.jl
- What’s new in TidierData.jl, TidierDB.jl, TidierFiles.jl, and TidierCats.jl
Announcing TidierIteration.jl (v1.0.0)
TidierIteration.jl is a package aimed at making it easier to iterate on collections, modeled after the purrr R package. It also provides some tools of functional programming: adverbs, composition, safe-functions and more.
Here’s a list of the supported functions. map_* functions apply a function on a collection and return a collection. walk_* functions work similarly to map_* but do not return anything – they are primarily intended to be used where the function produces side effects only (e.g., saving output to files). modify_* functions update collections in-place. and flatten_* functions convert ragged collections (e.g., JSON-style data) into non-ragged collections.
Map
map_tidy, map_values (for iterating on Dict values), map_keys (for iterating on Dict keys), map_dfr, map_dfc, map2, imap, pmap
Walk
walk, walk2, iwalk, pwalk
Modify
modify, modify!, modify_values!, modify_if, modify_if!
Keep, Discard, and Compact
keep, keep!, keep_keys, discard, discard!, compact, compact!
Predicates
is_empty, is_non_empty, every, some, none, detect_index, detect, has_element, has_key, get_value
Adverbs
compose, compose_n, negate, possibly
Flatten
flatten, flatten_n, flatten_dfr, flatten_json, flatten_dfr_json, json_string, to_json
Why use TidierIteration.jl when Julia already has great iteration capabilities?
- The collection is always the first argument of the
map_* family of functions, which makes the functions easier to use inside of chains/pipes
- We extend the
map_* family to Julia objects which are not mapped by default, like dictionaries, for which we have map_values() and map_keys()
- We also provide the
map2, imap and pmap methods to map over 2 or n elements at the same time
- We provide the
flatten_* functions to tidy wild dictionaries (like JSON responses from APIs) and many adverbs.
TidierData.jl v0.16.2 released today
The latest version brings in a bugfix and some minor improvements:
- Bugfix:
@slice_min() and @slice_max() respect the n argument
- Adds
@head as a convenience wrapper around @slice_head()
- Adds
extra argument for @separate() and remove argument for @unite()
We’ve also added our first round of syntax comparisons to DataFrames.jl for users who go back and forth between the two packages: Comparison to DF.jl - TidierData.jl (tidierorg.github.io)
There are a number of TidierData.jl “features” we don’t currently highlight on the comparisons, so stay tuned for further expansion of this page.
TidierDB.jl is now up to v0.3.3 and gained a number of improvements over the summer
- The package is much lighter and relies on package extensions for:
- Postgres, ClickHouse, MySQL, MsSQL, SQLite, Oracle, Athena, and Google BigQuery
- (Documentation)[Getting Started - TidierDB.jl] updated for using these backends.
- adds support for reading from multiple files at once as a vector of paths in
db_table when using DuckDB
- ie
db_table(db, ["path1", "path2"])
- adds streaming support when using DuckDB with
@collect(stream = true)
- allows user to customize file reading via
db_table(db, "read_*(path, args)") when using DuckDB
- adds
@head for limiting number of collected rows
- adds support for reading URLs in
db_table with ClickHouse
- adds support for reading from multiple files at once as a vector of urls in
db_table when using ClickHouse
- ie
db_table(db, ["url1", "url2"])
- Bugfix:
@count updates metadata
- adds
connect() support for Microsoft SQL Server
- adds
show_tables for most backends to view existing tables
- Docs comparing TidierDB to Python’s Ibis: TidierDB.jl vs Ibis - TidierDB.jl
- Docs around working with larger than RAM data: Working With Larger than RAM Datasets - TidierDB.jl
TidierFiles.jl v0.1.4 introduces a general file reader/writer function
Inspired by FileIO.jl and the rio R package, TidierFiles now includes read_file() and write_file() functions that work across all tabular file types supported by the package. This means that you can use a consistent interface (same arguments) across the following file types with a single function, which previously required the below bespoke functions:
read_csv and write_csv
read_tsv and write_tsv
read_xlsx and write_xlsx
read_delim and write_delim
read_table and write_table
read_fwf
read_sav and write_sav (.sav and .por)
read_sas and write_sas (.sas7bdat and .xpt)
read_dta and write_dta (.dta)
read_arrow and write_arrow
read_parquet and write_parquet
read_rdata (.rdata and .rds)
TidierCats.jl v0.1.2 was released last week
It adds 3 new functions for working with categorical variables:
cat_replace_missing: Lumps infrequent levels in a categorical array into an ‘other’ level based on proportion threshold.
cat_other: Replaces selected levels in a categorical array with the ‘other’ level.
cat_recode: Recodes the levels in a categorical array based on a provided mapping.
It’s a been a busy summer for Tidier! We are continuing to work on packages across our ecosystem and welcome users and contributors.
(Sharing on behalf of the Tidier team)