I’m not totally sure what you mean by “lazy” joins and join groups. Can you elaborate? I’m all for efficiency (:
Grouping was one of the first features added to FlexiJoins
, was already present when I originally announced at [ANN] FlexiJoins.jl: fresh take on joining datasets. It groups either by the left or the right-hand side. For example, grouping by the left side turns the default flat list of matches [(1, 1), (1, 2), (1, 3), (3, 1)]
into [(1, [1, 2, 3]), (2, []), (3, [1])]
.
All join results are view
s of the original datasets, no matter if flat/grouped. Is this what you refer to as “lazy”? However, indices of matches are always computed eagerly, don’t think there is a way around that.
For now, grouped results work with many collections and tables, except for DataFrames. They have a very different interface compared to other collections, so FlexiJoins
grouping doesn’t work with them as-is. I believe the potential DataFrames support is easy to implement, but not sure what the reasonable interface should be. I don’t really encounter DataFrames myself, and don’t know what kind of return type their users would expect from a grouped join.