[ANN] FlexiJoins.jl: fresh take on joining datasets

FlexiJoins.jl is a fresh take on joining tabular or non-tabular datasets in Julia: Alexander Plavin / FlexiJoins.jl · GitLab.

From simple joins by key, to asof joins, to merging catalogs of terrestrial or celestial coordinates – FlexiJoins supports any usecase. The package is registered in General.

I’m not aware of any other similarly general implementation, neither in Julia nor in Python. At the same time, it’s only 366 lines of code!

Defining features that make the package flexible:

  • Wide range of join conditions: by key (so-called equi-join), by distance, by predicate, the closest match (asof join)
  • All kinds of joins, as in inner/left/right/outer
  • Results can either be a flat list, or grouped by the left/right side
  • Various dataset types transparently supported (not all Tables work, though)

With all these features, FlexiJoins is designed to be easy-to-use and fast:

  • Uniform interface to all functionaly
  • Performance close to other, less general, solutions: see benchmarks
  • Extensible in terms of both new join conditions and more specialized algorithms

Usage examples showcasing main features:

innerjoin((objects, measurements), by_key(:name))

leftjoin((O=objects, M=measurements), by_key(x -> x.name); groupby=:O)

innerjoin((M1=measurements, M2=measurements), by_key(:name) & by_distance(:time, Euclidean(), <=(3)))

innerjoin(
	(O=objects, M=measurements),
	by_key(:name) & by_pred(:ref_time, <, :time);
	multi=(M=closest,)
)

Documentation with explanations and more examples is available as a Pluto notebook. Docstrings also exist, but are pretty minimal for now.

I’ve been building FlexiJoins piece by piece for some time, based on what I needed. The interface and underlying implementation has proven to be flexible and extensible enough, but comments and suggestions are welcome.

21 Likes

I like that you have joins conditioned on custom predicates.

Indeed, the predicate and distance joins were the main features I missed before, and the main motivation for FlexiJoins. Also, intuitively combining join conditions together.