Searchsorted with predicate

Often when using the searchsorted function family, I wish predicate versions of those functions existed:

searchsorted(==(5), array)
searchsorted(>(5), array)
searchsortedfirst(>(5), array)
searchsortedlast(<=(5), array)
searchsortedlast(>=(5), array; rev=true)

Note that these aren’t trivial to implement correctly and performantly based on existing searchsorted functions.
Would interface like this make sense in Base or in a package? Or maybe it already exists somewhere?

  • The first example seems challenging with binary search (since the function is not monotone, hence, it is hard to find the starting point and the binary search might overlook the relevant range).

  • For monotone functions you could use LazyArrays.jl. If the function is monotone decreasing, than one would need to add rev=true. For example

using LazyArrays

array = rand(1:10, 100)
searchsorted(BroadcastArray(>(5), array), 1)
searchsorted(BroadcastArray(<(5), array), 1, rev=true)
searchsortedfirst(BroadcastArray(>(5), array), 1)
searchsortedfirst(BroadcastArray(<(5), array), 1, rev=true)
searchsortedlast(BroadcastArray(>=(5), array), 1)
searchsortedlast(BroadcastArray(<=(5), array), 1, rev=true)

I don’t see any issues here. Base searchsorted does basically this already, and the exact syntax from my first post is easy to add on top:

julia> Base.searchsorted(f::Base.Fix2{typeof(==)}, A) = searchsorted(A, f.x)

julia> searchsorted(==(5), [1, 2, 3, 5, 5, 5, 10])
4:6

That’s pretty clever!
I already use mapped arrays (a much lighter mapview instead of BroadcastArray) for searchsorted in a different context - when searching by f(A[i]) == x instead of A[i] == x:

using SplitApplyCombine

searchsorted(mapview(f, A), x)

However, for <, >, ... searches, the interface in the first post would be simpler and more consistent than searchsorted(mapview(>(5), A), 1). A couple of issues with the latter:

  • doesn’t support == predicate
  • the rev argument doesn’t correspond to how the array is sorted: its meaning is different between sort(A; rev) and searchsorted(mapview(<(5), A), 1; rev) leading to potential confusion
  • Not extensible: the predicate interface I outlined also allows for stuff like searchsorted(∈(1:5), A) and searchsorted(∈(1..5), A). With the current searchsorted interface, the IntervalSets package had to define a separate function searchsorted_interval(A, 1..5).
  • not intuitive, while searchsorted(>(5), A) is basically a more efficient drop-in replacement for findall(>(5), A) when the array is sorted.

What kind of predicates do you have in mind?

  • Only ==, >, <, <=, >=, ∈(a:b) with special implementations for each?
  • Or are you aiming at arbitrary predicates p(x)::Bool = ...?

I think the situation with arbitrary predicates would be pretty difficult. Only had comparison operators in mind, so basically this list:

+ extensions by packages, eg for intervals.

Oh, and the mapview/BroadcastArray solution can be significantly slower. For A = 1:10000, x = 1234 they are 25 ns instead of 3 ns for searchsortedfirst($A, $x).