# Sorted product of infinite iterators

I have a collection of (potentially infinite) sorted iterators `I1`, `I2`, `I3`, … and a nondecreasing value function `sort_by((x1, x2, x3, ...))`. I want the sorted cartesian product of these iterators.

I’m thinking I can do this with a heap (adding a set may in certain edge cases result in a speedup), but I want to see if anyone else already has already implemented a solution to this problem.

(If `I1` and `I2` are the naturals and `sort_by` is `+`, then this provides a constructive proof that `N x N` is countable.)

I’m a little unclear on what you mean by “sorted iterators” and “sorted product” - sorted by what metric, respectively?

If I understand correctly, `sorted_by` is your metric and you want the values (i.e. the tuples) produced by the iterators if you were to iterate them simultaneously to be sorted by that metric?

For finite iterators, by sorted I mean `issorted(I)` returns true. By sorted by sorted_by I mean `issorted(product, bt=sorted_by)` returns true. For infinite iterators I mean every prefix is sorted.

I want a result that is equivalent to `sort(vec(collect(product(I1, I2, I3, ...))), by=sorted_by)` but without `collect`ing.

I don’t think there is a good way to do this algorithmically. pretty much any good sorting algorithm will need to collect the data.

We can assume that the input iterators are sorted and sorted_by is nondecreasing, so we know that the first element of the output should be the first elements of the inputs. I think heapsort can get n log n in this case.

You don’t mean `product` then. The output of `product` is tuples with one entry from each iterator.

Sorry, `sorted_by` accepts a tuple, not three arguments. I’ve added a set of parentheses in my OP to reflect this.

I’ve implemented it. The important bits are in the implementation of the two-argument `iterate`. I’ve included a bit more for context, and you can see the complete package here.

``````function SortedIteratorProduct(by::Function, iterators...)
sources = cached.(iterators)
SortedIteratorProduct(sources, by)
end

lookup(sip, x) = tuple((s[i] for (s, i) in zip(sip.sources, x))...)
function Base.iterate(sip::SortedIteratorProduct)
all(x -> checkbounds(Bool, x, 1), sip.sources) || return nothing
one = map(_->1, sip.sources)
iterate(sip, (Set((one,)), BinaryHeap(Base.By(x -> (sip.by(lookup(sip, x)), reverse(x))), [one])))
end
function Base.iterate(sip::SortedIteratorProduct, (set, heap))
isempty(heap) && return nothing
indices = pop!(heap)

for i in eachindex(indices)
new = ntuple(j -> indices[j] + (j == i), length(indices))
if checkbounds(Bool, sip.sources[i], indices[i]+1) && new ∉ set
push!(set, new)
push!(heap, new)
end
end
lookup(sip, indices), (set, heap)
end
``````
1 Like

Nice! With a bit more bookkeeping you can discard items from `set` once everything immediately higher in each direction is in `set`. That is, only keep the frontier. I assume this would typically save n^{1/d} memory.

Thanks! I think that would work. My use case is not sensitive to space complexity, so I won’t bother, but I’ll write it down just in case.