[ANN] DataFrameIntervals.jl — joins on intervals of time

aplavin · July 16, 2022, 6:51am

I’m not totally sure what you mean by “lazy” joins and join groups. Can you elaborate? I’m all for efficiency (:

Grouping was one of the first features added to FlexiJoins, was already present when I originally announced at [ANN] FlexiJoins.jl: fresh take on joining datasets. It groups either by the left or the right-hand side. For example, grouping by the left side turns the default flat list of matches [(1, 1), (1, 2), (1, 3), (3, 1)] into [(1, [1, 2, 3]), (2, []), (3, [1])].

All join results are views of the original datasets, no matter if flat/grouped. Is this what you refer to as “lazy”? However, indices of matches are always computed eagerly, don’t think there is a way around that.

For now, grouped results work with many collections and tables, except for DataFrames. They have a very different interface compared to other collections, so FlexiJoins grouping doesn’t work with them as-is. I believe the potential DataFrames support is easy to implement, but not sure what the reasonable interface should be. I don’t really encounter DataFrames myself, and don’t know what kind of return type their users would expect from a grouped join.

Topic		Replies	Views
Conditional left join 2 dataframes when none of the columns are common General Usage dataframes , flexijoins	26	1982	May 26, 2022
[ANN] FlexiJoins.jl: fresh take on joining datasets Package Announcements	32	3183	April 3, 2025
Asof join support in DataFrames.jl Data dataframes	9	919	October 13, 2022
Spatial join with dataframes General Usage dataframes , geo	27	2082	September 6, 2024
Using DataFrames to split overlapping datetime intervals into non overlapping intervals General Usage question , dataframes , dataframesmeta	9	366	July 23, 2023

[ANN] DataFrameIntervals.jl — joins on intervals of time

Related topics