I’m pleased to introduce the newest BioJulia package: KWayMerges.jl - currently undergoing registration.
This tiny package provides the KWayMerger type, which lazily merges multiple sorted iterators into a single, sorted stream. It’s generally faster than appending all the iterators and calling sort!.
For example:
julia> using KWayMerges
julia> vs = [[4,6], [7,9], [1,5]];
julia> [last(i) for i in KWayMerger(vs)]
6-element Vector{Int64}:
1
4
5
6
7
9
Currently, it’s implemented using a heap. The Wikipedia article claims a tournament tree is faster, so soon-ish I’ll try re-implementing the KWayMerger using one of those and see if it’s faster. But for now, I just wanted something that I knew worked.
11 Likes
This maybe a basic Julia question, but:
help?> KWayMerger
...
This iterator yields (index::Int, x::T) elements
Is there a reason (performance or otherwise) that it can’t yield @NamedTuple{fromiter::Int, value::T} elements instead? Remembering what the two tuple elements mean is a small thing in isolation, but can get pretty annoying and negatively affect readability when part of a large complex codebase. NamedTuples would still allow numeric-index-based and first/last access if preferred, but also allow nicer, more ergonomic access and display. I don’t know what the other side of the tradeoff is though, if any.
That’s a good idea. I should probably also add a sort-like API supporting by, lt, rev and order, if those are zero-cost.
1 Like