How to use the `by` argument to `searchsortedlast()`

I am trying to find the index of the last element of a such that f(a) <= 10. You can do this with findlast:

julia> a = [1, 2, 3, 4, 5]
5-element Vector{Int64}:

julia> findlast(x -> x^2 ≤ 10, a)

However, I know that my input is sorted and that f(a) is a monotonic transformation, so I would like to use binary search instead. I wrote my own binary search routine inline, but then discovered that Julia has a searchsortedlast() function that should serve my purposes. However, I can’t get seem to get the by argument to work:

julia> searchsortedlast(a, 10, by= x->x^2)

This is identical to the output of searchsortedlast(a, 10).

Of course, you can transform a locally as follows:

julia> searchsortedlast(a .^ 2, 10)

But this defeats the purpose of binary search because it requires you to evaluate a[i]^2 at every index.

Hi, that’s an interesting find. Probably warants a PR to the docs if not a discussion about the expected behavior of this. note to self: search for existing issues before investigating

This is what happens:
You end up calling this function:

function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer
    u = T(1)
    lo = lo - u
    hi = hi + u
    @inbounds while lo < hi - u
        m = midpoint(lo, hi)
        if lt(o, v[m], x)
            lo = m
            hi = m
    return hi

Where lo and hi are your first and last indices respectively and o is an ordering that stores your transformation. In this case it’s the By ordering here.
The interesting part happens in the line

if lt(o, v[m], x)

which calls these two:

lt(o::By, a, b) = lt(o.order,,
lt(o::ForwardOrdering, a, b) = isless(a,b)

Which in a condensed form means it checks isless(by(v[m]) , by(x)) at each step where by is the transformation you gave it.
So it really just applies the transformation to the given x value, too, which is probably not what you expected. If that’s intended or not needs to be answered by someone else though :slight_smile:

There are at least two related issues already:

1 Like

it really just applies the transformation to the given x value, too

Well that’s … interesting. I have been wracking my brain for a use case for this and the best I can come up with is a “database query” type of task along the lines of, “In this list of employees which is sorted by income, who is the highest-paid employee who makes less money than Sam, whose salary is 20?”

salaries = [
    ("Bob", 10)
    ("Bill", 15)
    ("Alice", 50)
    ("Junior", 800)
searchsortedlast(salaries, ("Sam", 20), by= x->x[2])
# 2

So basically, you have to know ahead of time a “left inverse” of your by function at the target value. In the search task in the OP, this is equivalent to doing searchsortedlast(a, sqrt(10), by = x->x^2).

In my case, f() is basically a black box: a is an index of target objective values for an optimization problem, f(a) is a function that checks if a feasible solution exists attaining that objective, and the problem is to find the largest a that admits a feasible solution (link to Github code).

It would be nice to have an interface that works like findfirst() here.

Here is an awful workaround:

a = [1, 2, 3, 4, 5]
my_square(x::NamedTuple) = x.k
my_square(x::Number) = x^2
searchsortedlast(a, (; k=10), by = my_square)
# 3

Edit: Oh, I see your edit now. Thanks!

Edit 2: This is quite a rabbit hole:

It seems like there have been many PRs to add this functionality, but most of them made suboptimal implementation choices or misleading keyword argument names. I hope a more consistent search/sort interface can be a priority for 2.0.

Agreed :smiley:

I guess the takeaway is to ignore the by keyword altogether, which wouldn’t be that bad imho. It just needs a tiny nudge in the right direction in the docstrings of these functions.

Gotta check if there’s an issue set for 2.0… Edit: found nothing, may write something when I have time

Ref: existing discourse post