Summer Happenings for Tidier.jl
- Introducing TidierIteration.jl
- What’s new in TidierData.jl, TidierDB.jl, TidierFiles.jl, and TidierCats.jl
Announcing TidierIteration.jl (v1.0.0)
TidierIteration.jl is a package aimed at making it easier to iterate on collections, modeled after the purrr
R package. It also provides some tools of functional programming: adverbs, composition, safe-functions and more.
Here’s a list of the supported functions. map_*
functions apply a function on a collection and return a collection. walk_*
functions work similarly to map_*
but do not return anything – they are primarily intended to be used where the function produces side effects only (e.g., saving output to files). modify_*
functions update collections in-place. and flatten_*
functions convert ragged collections (e.g., JSON-style data) into non-ragged collections.
Map
map_tidy
, map_values
(for iterating on Dict values), map_keys
(for iterating on Dict keys), map_dfr
, map_dfc
, map2
, imap
, pmap
Walk
walk
, walk2
, iwalk
, pwalk
Modify
modify
, modify!
, modify_values!
, modify_if
, modify_if!
Keep, Discard, and Compact
keep
, keep!
, keep_keys
, discard
, discard!
, compact
, compact!
Predicates
is_empty
, is_non_empty
, every
, some
, none
, detect_index
, detect
, has_element
, has_key
, get_value
Adverbs
compose
, compose_n
, negate
, possibly
Flatten
flatten
, flatten_n
, flatten_dfr
, flatten_json
, flatten_dfr_json
, json_string
, to_json
Why use TidierIteration.jl when Julia already has great iteration capabilities?
- The collection is always the first argument of the
map_*
family of functions, which makes the functions easier to use inside of chains/pipes
- We extend the
map_*
family to Julia objects which are not mapped by default, like dictionaries, for which we have map_values()
and map_keys()
- We also provide the
map2
, imap
and pmap
methods to map
over 2 or n elements at the same time
- We provide the
flatten_*
functions to tidy wild dictionaries (like JSON responses from APIs) and many adverbs.
TidierData.jl v0.16.2 released today
The latest version brings in a bugfix and some minor improvements:
- Bugfix:
@slice_min()
and @slice_max()
respect the n
argument
- Adds
@head
as a convenience wrapper around @slice_head()
- Adds
extra
argument for @separate()
and remove
argument for @unite()
We’ve also added our first round of syntax comparisons to DataFrames.jl
for users who go back and forth between the two packages: Comparison to DF.jl - TidierData.jl (tidierorg.github.io)
There are a number of TidierData.jl “features” we don’t currently highlight on the comparisons, so stay tuned for further expansion of this page.
TidierDB.jl is now up to v0.3.3 and gained a number of improvements over the summer
- The package is much lighter and relies on package extensions for:
- Postgres, ClickHouse, MySQL, MsSQL, SQLite, Oracle, Athena, and Google BigQuery
- (Documentation)[Getting Started - TidierDB.jl] updated for using these backends.
- adds support for reading from multiple files at once as a vector of paths in
db_table
when using DuckDB
- ie
db_table(db, ["path1", "path2"])
- adds streaming support when using DuckDB with
@collect(stream = true)
- allows user to customize file reading via
db_table(db, "read_*(path, args)")
when using DuckDB
- adds
@head
for limiting number of collected rows
- adds support for reading URLs in
db_table
with ClickHouse
- adds support for reading from multiple files at once as a vector of urls in
db_table
when using ClickHouse
- ie
db_table(db, ["url1", "url2"])
- Bugfix:
@count
updates metadata
- adds
connect()
support for Microsoft SQL Server
- adds
show_tables
for most backends to view existing tables
- Docs comparing TidierDB to Python’s Ibis: TidierDB.jl vs Ibis - TidierDB.jl
- Docs around working with larger than RAM data: Working With Larger than RAM Datasets - TidierDB.jl
TidierFiles.jl v0.1.4 introduces a general file reader/writer function
Inspired by FileIO.jl and the rio R package, TidierFiles now includes read_file()
and write_file()
functions that work across all tabular file types supported by the package. This means that you can use a consistent interface (same arguments) across the following file types with a single function, which previously required the below bespoke functions:
read_csv
and write_csv
read_tsv
and write_tsv
read_xlsx
and write_xlsx
read_delim
and write_delim
read_table
and write_table
read_fwf
read_sav
and write_sav
(.sav and .por)
read_sas
and write_sas
(.sas7bdat and .xpt)
read_dta
and write_dta
(.dta)
read_arrow
and write_arrow
read_parquet
and write_parquet
read_rdata
(.rdata and .rds)
TidierCats.jl v0.1.2 was released last week
It adds 3 new functions for working with categorical variables:
cat_replace_missing
: Lumps infrequent levels in a categorical array into an ‘other’ level based on proportion threshold.
cat_other
: Replaces selected levels in a categorical array with the ‘other’ level.
cat_recode
: Recodes the levels in a categorical array based on a provided mapping.
It’s a been a busy summer for Tidier! We are continuing to work on packages across our ecosystem and welcome users and contributors.
(Sharing on behalf of the Tidier team)