# How to searchsortedfirst when each element requires expensive transformation

I have a problem where I have an expensive function `expensive()` and I want to find the first integer `i` in `0:n` such that `expensive(i) > a` where `a` is some constant and we know that `expensive(j) > a` for `j >= i`. I want to minimize the number of calls to `expensive()`. The inbuilt function `searchsortedfirst` does something similar in that it finds the index of a sorted array the first element that is greater than or equal to a threshold. Is there a way I can use that function but only evaluate `expensive()` as needed?

I think `searchsortedfirst` does what you want, just providing by what function you want the elements to be evaluated.

``````julia> x = sort(rand(10))
10-element Array{Float64,1}:
0.007505240568856264
0.170164020425573
0.22180879545782117
0.2423166237740766
0.4174836106540334
0.4778024570655919
0.5637894682887481
0.7442152133727271
0.9259268508181902
0.9533116492072238

julia> expensive(x) = x #any function

julia> searchsortedfirst(x,0.5,by=x->expensive(x))
7
``````

(That this calls the least number of times the “expensive” function is something that I would trust in the implementation of this search)

edit: this is not working as I expected, and I don’t know why, as described below.

1 Like

I am not sure why @lmiq has deleted his answer, but I think it is correct. You should call `searchsortedfirst` passing `expensive` to the `by` keyword parameter. Obviously, this assumes the property you mentioned, that the vector/range has the guarantee to be sorted by this `expensive` function even if you do not really know which exact values it would return.

The code for that is a little obfuscated inside the source (because it is generated by metaprogramming) but you can check this behaviour by putting print statements inside the `expensive` function.

My original answer was that using a transformation in searcshortedfirst would do what he wanted. However, I do not understand how these transformations are being interpreted in the `searchsortedfirst` function:

``````julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3

julia> searchsortedfirst(x,2)
2

julia> searchsortedfirst(x,2,by=x->x-1)
2

``````

Shouldn’t the `searchsortedfirst(x,2,by=x->x-1)` return `3`?

`searchsortedfirst` also applies the `by` function to the input `2`. For example:

``````julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3

julia> searchsortedfirst(x, 1.5; by = x -> (println(x); x));
2
1.5
1
1.5
``````

So I don’t think using the `by` keyword is the answer here (even because the expensive function would be called several times for the same input). Maybe wrapping `0:n` into a new `AbstractVector` type that does the expensive computation when calling `getindex` is the best answer.

1 Like

I suppose the `searchsortedfirst` evaluates the function a minimal number of times (or at least it is a reasonable implementation of that) to find the element, doesn’t it?

(Concerning the fact that the function is applied also to the condition, that is unnatural to me, but thanks for clarifying that).

Taking this into account, you would need to know one `x` such that `expensive(x)` is the threshold you want to follow the code I provided there.

My example shows that `1.5` is printed two times, so `by` is applied to `1.5` two times. I didn’t dig into the source code, but I suppose `by` will be called on the same input for each comparison using `isless` (or the `lt` keyword).

Sure, but that is a particular case in which the search implies at least two evaluations:

``````julia> x = collect(1:100);

julia> searchsortedfirst(x, 20.5; by = x -> (println(x); x));
50
20.5
25
20.5
12
20.5
18
20.5
21
20.5
19
20.5
20
20.5

``````

(evaluated 7 times)

1 Like

Of course. That’s what I meant. Note that `by` was evaluated 14 times (two for each of the 7 comparisons).

2 Likes

Just to make it clearer:

``````struct NewType{T, V, F} <: AbstractVector{T}
v::V
f::F
NewType{T}(v::V, f::F) where {T, V, F} = new{T, V, F}(v, f)
end
Base.size(x::NewType) = size(x.v)
Base.axes(x::NewType) = axes(x.v)
Base.@propagate_inbounds Base.getindex(x::NewType, I...) = x.f(x.v[I...])

x = 1:100
input = NewType{Int}(x, x -> (println(x); x))

searchsortedfirst(input, 20.5);
``````

output:

``````50
25
12
18
21
19
20
``````
1 Like

@lucas711642 @lmiq thank you so much! I used the NewType method suggested by @lucas711642. I think it would be more intuitive if this was the default behavior of the inbuilt function.

I think the reason for the current behavior is that `searchsortedfirst(v, x)` is supposed to give you the position at which `x` must be inserted in `v` so that the resulting vector is still sorted. In this case, `by` must be applied to `x` too.

To be sincere, I don’t understand what you did there. Doesn’t this line evaluate the function for every element of the list?

I’m just creating an instance of `NewType`, according to the constructor defined in:

The parameter `T` is used just to give the compiler the `eltype` of `NewType`. If this optimization isn’t necessary, we could just define the following:

``````struct NewType{V, F} <: AbstractVector{Any}
v::V
f::F
end
``````

But this has type-instability problems.

I’m assuming that `T` is the type the type of each element. Maybe a better definition of `getindex` is given by:

``````Base.@propagate_inbounds Base.getindex(x::NewType{T}, I...) where T = convert(T, x.f(x.v[I...]))::T
``````

The `::T` guarantees that `x[I...] isa eltype(x)`.

To sum up:

``````struct NewType{T, V, F} <: AbstractVector{T}
v::V
f::F
NewType{T}(v::V, f::F) where {T, V, F} = new{T, V, F}(v, f)
end
Base.size(x::NewType) = size(x.v)
Base.axes(x::NewType) = axes(x.v)
Base.@propagate_inbounds Base.getindex(x::NewType{T}, I...) where T = convert(T, x.f(x.v[I...]))::T

input1 = NewType{Int}(1:100, x -> x ^ 2)
eltype(input1) # Int
input1 # outputs 25 (Int)

input2 = NewType{Float64}(1:100, x -> x ^ 2)
eltype(input2) # Float64
input2 # outputs 25. (Float64)
``````
4 Likes

I think you mean in the REPL… Of course, if you execute this on the terminal you need the semicolon. Otherwise, it will call `display` which in turn will call `getindex` and execute the function unnecessarily.

1 Like

Perhaps my lack of understanding comes from the fact that, when I put this line in the REPL, I get as a result, instantly, the complete vector of `x^2`:

``````julia> input1 = NewType{Int}(1:100, x -> x ^ 2)
100-element NewType{Int64,UnitRange{Int64},var"#3#4"}:
1
4
9
16
25
⋮
9216
9409
9604
9801
10000

``````

If, instead, I suppress the `show` function, should I understand that the the same computation was not performed? (sorry if my question is too stupid. To be truth, it easier to me to understand the Base.searchsortedfirst function and implement it with modifications - the function is quite straightforward - than to get used to these fancy manipulation of types).

Great, that was it.

1 Like

Let me just finish here by saying that what I love about Julia is that we can write the code we are more familiar with, and get nice performance. Here I have just implemented a simple `my_firstsortedfirst` function to solve the problem above. Doing this is much more natural to me than using the elegant solution of @lucas711642, which I think involves understanding much deeper things about the workings of the type system:

``````using Test
using BenchmarkTools

function my_searchsortedfirst(v,x;by=x->x)
lo = 0
hi = length(v)
@inbounds while hi > lo + 1
m = trunc(Int64,(hi+lo)/2)
if x > by(v[m])
lo = m
else
hi = m
end
end
return hi
end

# Expensive function that does not change the order
function f(x)
s = 0.
for i in 1:1_000_000
s += sqrt(x/i)
end
s
end

# lucas way
struct NewType{T, V, F} <: AbstractVector{T}
v::V
f::F
NewType{T}(v::V, f::F) where {T, V, F} = new{T, V, F}(v, f)
end
Base.size(x::NewType) = size(x.v)
Base.axes(x::NewType) = axes(x.v)
Base.@propagate_inbounds Base.getindex(x::NewType, I...) = x.f(x.v[I...])

# input
x = f.(rand(1:100,10))
v = collect(1:100)

# test
t1 = [ my_searchsortedfirst(v,i,by=x->f(x)) for i in x ]
input = NewType{Int}(v, x -> f(x))
t2 = [ searchsortedfirst(input,i) for i in x ]
@test t1 == t2

# benchmark
x = f(20)
@btime my_searchsortedfirst(\$v,\$x,by=x->f(x))
@btime searchsortedfirst(\$input,\$x)

``````

Result:

``````julia> include("./my_searchsortedfirst.jl")
10.527 ms (0 allocations: 0 bytes)
10.530 ms (0 allocations: 0 bytes)
``````

So cool 2 Likes

Cool! One minor thing that might be worth mentioning is that it is a good practice to write just `f` instead of `x -> f(x)`. In this way you do not define a new function, but rather just reference to an existing one.

For example, there is an `identity` function that implements exactly `x -> x`, so your definition of `my_searchsortedfirst` could be written as `my_searchsortedfirst(v, x; by = identity)`.

2 Likes

By the way - `NewType` is essentially a simplified version of `MappedArray` from MappedArrays.jl, a function + array wrapper which lazily applies the transformation. For example:

``````A = mappedarray(f, 0:n)
searchsortedfirst(A, x)
``````
3 Likes