I have a sorted array `a`

, of positive integers (related to integer partitions), and need to repeatedly find the first index with a value >= `k`

for various `k`

uniformly distributed between 1 and `a[end]`

. This is exactly the description of `searchsortedfirst`

, but there doesn’t appear to be a way to take advantage of the fact that I know the values will be concentrated around `√(2n)`

, where `n`

is the length of the array, which will in this case is 80. In the below sample, the `target`

array I’m searching has typical behavior.

```
using BenchmarkTools
# The number of partitions of n with largest part at
# most k is stored at table[ (n*n-n)>>1 + k ]
struct PNums
table::Array{Int64,1}
end
function PNums(max_n::Int64)
pnum = Array{Int64,1}(undef, (max_n*max_n+max_n)>>1)
for n=1:max_n
noff = (n*n-n)>>1
pnum[ noff + 1 ] = 1
for k=2:n
pnum[ noff + k ] = pnum[ noff + k - 1 ] + pnum[ min(max(n-k,1),k) + ((n-k)*(n-k-1))>>1 ]
end
end
return PNums(pnum)
end
n = 80
target = PNums(n).table[(n*n-n)>>1+1:(n*n+n)>>1]
function f1(a::Array{Int64,1}, b::Array{Int64,1})
for k=1:a[end]
b[k]=searchsortedfirst(a,k)
end
end
dest = Array{Int64}(undef,target[end])
@btime f1(target,dest) # Gives 164.066 ms (0 allocations: 0 bytes)
```

I tried a few methods to take advantage of the known concentration of values, but my best result came from a 2-phase search, exponentially first to find a start point for a linear search.

```
function f2(a::Array{Int64,1}, b::Array{Int64,1})
for k=1:a[end]
i=1 ; @inbounds while i<<1 < 80 && a[ i<<1 ] < k i=i<<1 end
@inbounds while a[ i ] < k i+=1 end
b[k] = i
end
end
dest2 = Array{Int64}(undef,target[end])
@btime f2(target,dest2) # Gives 100.187 ms (0 allocations: 0 bytes)
```

A large portion of the speed-up seems related to the fact that `80`

is hard coded into `f2`

.

This leads to a few questions:

- Is there a standard way to take advantage of value concentration for searching a sorted array?
- Should there be a standard way?
- Is there a simple way to do better than
`f2`

in this particular case? What if I need a few (say 20) different length arrays? - How much of the improvement is just from eliminating the overhead of a function call?