[ANN] KWayMerges.jl: Small package for the k-way merge algorithm

I’m pleased to introduce the newest BioJulia package: KWayMerges.jl - currently undergoing registration.

This tiny package provides the KWayMerger type, which lazily merges multiple sorted iterators into a single, sorted stream. It’s generally faster than appending all the iterators and calling sort!.

For example:

julia> using KWayMerges

julia> vs = [[4,6], [7,9], [1,5]];

julia> [last(i) for i in KWayMerger(vs)]
6-element Vector{Int64}:
 1
 4
 5
 6
 7
 9

Currently, it’s implemented using a heap. The Wikipedia article claims a tournament tree is faster, so soon-ish I’ll try re-implementing the KWayMerger using one of those and see if it’s faster. But for now, I just wanted something that I knew worked.

10 Likes

This maybe a basic Julia question, but:

help?> KWayMerger
...

  This iterator yields (index::Int, x::T) elements

Is there a reason (performance or otherwise) that it can’t yield @NamedTuple{fromiter::Int, value::T} elements instead? Remembering what the two tuple elements mean is a small thing in isolation, but can get pretty annoying and negatively affect readability when part of a large complex codebase. NamedTuples would still allow numeric-index-based and first/last access if preferred, but also allow nicer, more ergonomic access and display. I don’t know what the other side of the tradeoff is though, if any.

That’s a good idea. I should probably also add a sort-like API supporting by, lt, rev and order, if those are zero-cost.

1 Like