Question regarding Query.jl @join implementation

gbenatt92 · November 29, 2018, 3:33pm

Hi! I’ve sent this question over a DM to professor @davidanthoff, but it’s better suited here.

Regarding the Query.jl package, I’d like to know what are the implementations of @join and @groupjoin, how efficient they are, and how they work under the lazy streaming format.

I’m really curious about this since joining tables efficiently is a really old problem that has many different approaches, and the package works on so many different table types.

Thanks a lot for your time!

davidanthoff · November 29, 2018, 5:12pm

You can see the implementation for @join here and for @groupjoin here.

Both use a hash join algorithm. In the LINQ terminology one can classify a query operator’s execution model as immediate vs deferred, and streaming vs non-streaming (here is the exact discussion). Both @join and @groupjoin have a deferred, non-streaming execution model in our implementation right now. What that means is that the first call to iterate will trigger full iteration of both sources for the join, and a complete materialization of the output of the join.

That could actually be improved, though: those implementations could be changed so that at least one of the join sources is not fully iterated in the first call to iterate, and instead properly streamed.

Our medium term project (that we are kind of starting now) is to create another backend for Query.jl that is specific to tabular sources, and then provide a full query optimizer that can pick more efficient algorithms, depending on the query one is trying to execute. Don’t expect anything in the next couple of months, but we are trying out different designs etc. right now.

Topic		Replies	Views
Query.jl v0.8x released Community announcement	0	986	November 21, 2017
LightQuery 0.7.0 Package Announcements data	11	746	August 20, 2020
The state of DataFrames.jl H2O benchmark Package Announcements dataframes	53	9351	January 1, 2025
Query.jl v0.11 released Package Announcements announcement	7	1116	February 4, 2019
Tables.jl: a table interface for everyone Data tables	19	10689	November 19, 2018

Question regarding Query.jl @join implementation

Related topics