You can see the implementation for
@join here and for
Both use a hash join algorithm. In the LINQ terminology one can classify a query operator’s execution model as immediate vs deferred, and streaming vs non-streaming (here is the exact discussion). Both
@groupjoin have a deferred, non-streaming execution model in our implementation right now. What that means is that the first call to
iterate will trigger full iteration of both sources for the join, and a complete materialization of the output of the join.
That could actually be improved, though: those implementations could be changed so that at least one of the join sources is not fully iterated in the first call to
iterate, and instead properly streamed.
Our medium term project (that we are kind of starting now) is to create another backend for Query.jl that is specific to tabular sources, and then provide a full query optimizer that can pick more efficient algorithms, depending on the query one is trying to execute. Don’t expect anything in the next couple of months, but we are trying out different designs etc. right now.