~50% of my compute time ends up fetching the right index to index into a sorted array of floats with another float.

At first I used findfirst, but this is the best I got for the moment:

fetch(vec,x) = vec[sum(y -> y <= x, vec)]
using BenchmarkTools
@btime fetch(vec,x) setup=(vec = sort([-Inf,randn(1000)...]); x = randn())

Since I know the array will be sorted, maybe there is a faster approach ? The stochastic nature of the setup corresponds more-or-less, in probability, to the cases I will encounter…

Indeed, this makes the job. In particular, I wanted searchsortedlast.

Weirdly, this benchmarking setup shows a 5x improvement from searchsortedlast over the previous function, but my actual code only shows a 10% speedup

One more trick I could try: the vec vector is actually the same for many searches. Maybe I could pre-compute something from it, like a continuous approximation that matches inputs to their right indices after rounding ?

Then the problem could be in a different part of the code.
Maybe you should post a more complete MWE.
In the meantime, feel around, try transforming the vector into a dictionary (maybe you even need to redefine the hash function to compare keys)

I’m not sure about orderedsets… But reimplementing searsortedlast by hand gave me a 30% speedup

function my_ssl(v, x)
lo = 0
hi = length(v)+1
@inbounds while lo < hi -1
m = lo + ((hi - lo) >>> 0x01) # midpoint of two integers.
if x < v[m]
hi = m
else
lo = m
end
end
return lo
end

Yep probably I should try to produce a bigger MWE…

I am coding on GPU which might be the reason there is a difference, but I noted that using:

c = 0
NeighborCellIndex = 0
@inbounds for i ∈ eachindex(UniqueCells)
c += 1
if LinearIndices(Tuple(UniqueCells[end]))[SCellIndex] == LinearIndices(Tuple(UniqueCells[end]))[UniqueCells[i]]
# if SCellIndex == UniqueCells[i]
NeighborCellIndex = c
end
end

Instead of using findfirst directly would give me a speed up as well. The reason I use the weird LinearIndices etc. is because I am working with CartesianIndices and found that comparing these is quite a bit slower.

function my_ssl(v, x, last_loc=1)
lo = 0
hi = length(v)+1
m = last_loc
@inbounds while lo < hi - 1
if x < v[m]
hi = m
else
lo = m
end
m = lo + ((hi - lo) >>> 0x01)
end
return lo
end

This should still converge on the correct answer even if the guess is off.

Could you give us a single instance of the problem, so we could benchmark stuff? I mean, a real list (preferably one which tends to be repeated many times) and a list of 1-10 values which are actually searched for?

The level and speed of response is so much better when there is something concrete to optimize. Even a 1000 long vector can fit nicely in a post, perhaps using something like:

@Dan Here is a more complete MWE with data included: MWE · GitHub

I used searchsortedlast in this code, since the other options that i have from this post do not do as good as they seem on synthetic datas… But there might be other points of my code to ameliorate too

Quick insight from MWE, the searchsorted array contains a grid i.e. regularly spaced values. In this case, no search needed, just a rounded division. Something along the lines of:

The grids is not perfect, sometimes some values are missing. table.age looks perfectly regular, but table.years is not: a few values are missing at the begining.

But maybe I could jump-start the binary search with a rounded division indeed. Let me try that.

But now I see it isn’t only the regularly spaced values in the array. But still, this may be a good starting point for an optimization…

Specifically, keeping an “index” array for grid which contains the indices of the regularly spaced part of the array will allow doing searchsorted only inside the bin from the regular grid. Something like (this is still untested):

using IterTools
gridindex = cumsum(vcat(1,length.(IterTools.groupby(x-> round(Int, x/prec), grid)|>collect)))
gridindex[end] = length(grid)
gridstart = 1.0 # according to MWE code
# for each element `val`
bucket = round(Int, (val-gridstart)/prec)
foundindex = gridindex[bucket] + searchsortedlast(@view( grid[gridindex[bucket]:gridindex[bucket+1]]), val)

Bytheway, I had forgot a multiplication in the gist for table.age and table.year: unit should be days, and not years (but it is still the same regularity), just corrected it.

I will try to use what you just posted, but frankly I do not understand it yet

The snippet I posted sort of gave a result (i.e. compiled) on my machine. The idea is using the regularly spaced part to split sorted grid into little sorted buckets. The specific bucket to search is easily found using division (and rounding). Then translating the result from within bucket back to whole list using the index of the bucket start.

There may be some edge issues. Especially if the non-regular bits of grid go outside the 1:prec:... regular bit.

I think you might be focusing on the wrong searchsortedlast: this one where we look on the grid is not the problematic one as it happends only on the top-level loop, while the two ones in the \lambda function actually happend in the inner loop (and the profiler says they are the problem).

Moreover if you want to check that you are not breaking the results, you can compare to this plot:

I see. Well the first searchsortedlast in R.ages is trivial, as R.ages is a regular range and the answer can be had with a simple round.

And the R.years is can also be done with a simple index into a year indexed array, as years are not too many (it’s a waste to search an array for this).

Indeed, and the R.years one is a bit more complicated as the grid is not completely regular. I’d like to keep something general (i.e. that would work for any grid R.ages and R.years) but somehow assumes that these grid would be almost-regular. Maybe a search jump-started by this round is the right option then

Yes, but keep in mind the snippet above. The “index” into buckets of the sorted array will reduce most of the time, and it is easily pre-prepared.

Much of the difficulty in sorting and searching algos is getting a “feel” for the distribution of values in the problem instance. Here the ages and years have a pretty known distribution and thus it is reasonable to expect a faster solution than a general Algo.

In this case — and, more generally, whenever you have a reasonable guess for the index — it can be hugely advantageous to do an “expanding” binary search to bracket the region where you expect to find the result. Basically, instead of converging, the lo and hi values in my_ssl should move away from the guess until the target is enclosed, then you can feed this into something like my_ssl instead of the very first and last indices. In fact, searchsortedfirst / last have methods that take these brackets, which speeds things up.

I added this to DataInterpolations.jl, but now looking for the link I see that it was recently factored out into FindFirstFunctions.jl. The “correlated” in the name searchsortedlastcorrelated is because of its history with interpolations, where you repeatedly call with correlated values. Maybe searchsortedlastguess would be more descriptive of what you want, but I think that’s the function you want.