Built-in binary search for sorted collections

Leo_I · April 24, 2025, 2:13pm

Given integers N and k, I’d like to elegantly find the largest integer n for which binomial(n,k) <= N. A way to do this is

function max_n_binom_leq(N::Integer, k::Integer) 
    return maximum(n for n=k:N if binomial(big(n),k) ≤ N) 
end

This is wasteful. because it has complexity O(N) instead of O(log N). The function n -> binomial(big(n),k) is strictly increasing. A better way is to do bisection (binary search). But I wish to avoid implementing it on my own. Does Julia have some built-in methods to solve this elegantly? I would especially like to avoid allocating the whole array k:N, since N might be very large, like typemax(UInt128).

Jeff_Emanuel · April 24, 2025, 2:33pm

searchsortedlast

Something like searchsortedlast(k:N,N,by=n->binomial(big(n),k)) + (k-1)

GunnarFarneback · April 24, 2025, 3:18pm

You don’t really have to search much to find this. Even with a very crude approximation like

binomial(n, k) = n * (n - 1) * ... * (n - (k - 1)) / k! ≈ (n - (k - 1) / 2)^k / k!

you can solve for n in closed form and end up somewhere close. With better approximations you can probably get spot on.

Leo_I · April 26, 2025, 3:50pm

@Jeff_Emanuel Your suggestion gives an incorrect answer:

julia> N, k = typemax(UInt16), 10
julia> max_n_binom_leq(N,k)
18
julia> searchsortedlast(k:N,N,by=n->binomial(big(n),k)) + (k-1)
65535

@GunnarFarneback Maybe in my particular case, an approximation solves the problem, but not in general. I’ve decided to write my own function:

function sortedfindfirst(x::AbstractVector{te}, f::F) ::te  where {te, F<:Function}
    a = BigInt(1)
    c = BigInt(hasmethod(step,(typeof(x),)) ? floor((big(last(x)) - big(first(x))) / big(step(x))) + 1 : length(x))  # Range vs Vector
    @assert c≥1
    @assert isa(f(x[c]), Bool) && f(x[c])
    while a+1 < c   # bisection loop
        b = a+(c-a)÷2  # equivalent to (a+c)÷2 but does not overflow
        (a,c) = !f(x[a]) && f(x[b]) ? (a,b) : (b,c)
    end
    return f(x[a]) ? x[a] : x[c] 
end

For example, to numerically solve the transcendental equation x=log(x)^2, we’d just use:

julia> @time sortedfindfirst(0:1e-10:1, x -> x>log(x)^2)
  0.027249 seconds (28.80 k allocations: 1.460 MiB, 99.59% compilation time)
0.4948664146

abraemer · April 26, 2025, 7:47pm

Leo_I:

function sortedfindfirst(x::AbstractVector{te}, f::F) ::te  where {te, F<:Function}
    a = BigInt(1)
    c = BigInt(hasmethod(step,(typeof(x),)) ? floor((big(last(x)) - big(first(x))) / big(step(x))) + 1 : length(x))  # Range vs Vector
    @assert c≥1
    @assert isa(f(x[c]), Bool) && f(x[c])
    while a+1 < c   # bisection loop
        b = (a+c)÷2
        (a,c) = !f(x[a]) && f(x[b]) ? (a,b) : (b,c)
    end
    return f(x[a]) ? x[a] : x[c] 
end

That seems a bit more convoluted than necessary. You can index ranges just like a Vector and using BigInt as indices should not be necessary.
So I think you could just write:

function sortedfindfirst(f::F, x::AbstractVector{te}) ::te  where {te, F<:Function}
    a = firstindex(x)
    c = lastindex(x)
    @assert c≥1
    @assert isa(f(x[c]), Bool) && f(x[c])
    while a+1 < c   # bisection loop
        b = (a+c)÷2
        (a,c) = !f(x[a]) && f(x[b]) ? (a,b) : (b,c)
    end
    return f(x[a]) ? x[a] : x[c] 
end

I also put the function argument first as is convention in Julia.
(I am on mobile and did not run this code)

bertschi · April 26, 2025, 9:18pm

The problem with searchsortedlast is documented here:

help?> searchsortedlast
...
 Note that the by function is applied to the searched value x as well as the values in v.

It can be made to work though, by applying the function to the values only, i.e.,

let f(n) = binomial(big(n),k); (k:N)[searchsortedlast(f.(k:N), N)] end

which unnecessarily evaluates binomal for all input values though. Yet, we can delay evaluation and use the by argument to force the values needed:

macro delay(expr) :(() -> $(esc(expr))) end
force(x) = x()
let f(n) = @delay binomial(big(n),k); (k:N)[searchsortedlast(f.(k:N), @delay N; by = force)] end

Leo_I · April 27, 2025, 10:00am

That seems a bit more convoluted than necessary. You can index ranges just like a Vector and using BigInt as indices should not be necessary.

I would prefer to avoid a convoluted solution, yes, but there are problems:

julia> N=typemax(UInt64);  x = 0:N;  typeof(lastindex(x)), lastindex(x), length(x)
(UInt64, 0x0000000000000000, 0x0000000000000000)

@abraemer Any advice on how to handle those cases?

@bertschi Thanks, but that sounds way too complicated for just a mere bisection use.

abraemer · April 27, 2025, 11:06am

Oof, nice edge case
Will searching all values of a number type be a common case for you?
Anyhow for this I case I would create another method:

function sortedfindfirst(f::F, ::T) ::te  where {T::Type{<:Integer}, F<:Function}
    a = typemin(T)
    c = typemax(T)
    I = oneunit(T)
    I2 = I+I
    @assert isa(f(c), Bool) && f(c)
    while a+I < c   # bisection loop
        b =  (a+c)÷I2
        (a,c) = !f(a) && f(b) ? (a,b) : (b,c)
    end
    return f(a) ? a : c
end

Again untested because I am on mobile.
For Floats you’d need a slightly different code.

Leo_I · April 27, 2025, 11:15am

Hmm, doubling the amount of code just for this case is ugly to me. Besides, it seems somehow wrong to me that lastindex and lengthof 0:typemax(UInt)) is 0 even though iterating through this range is nonempty. Could this be considered a bug in Base Julia?

The casting seems inconsistent to me:

julia> for te∈(Int8,UInt8,Int16,UInt16,Int32,UInt32,Int64,UInt64,Int128,UInt128)  
       x = typemin(te):typemax(te);  idx=lastindex(x); len=length(x); println(typeof(idx)," ",typeof(len)," ",big(idx)," ",big(len)) end
Int64 Int64 256 256
Int64 Int64 256 256
Int64 Int64 65536 65536
Int64 Int64 65536 65536
Int64 Int64 4294967296 4294967296
Int64 Int64 4294967296 4294967296
Int64 Int64 0 0
UInt64 UInt64 0 0
Int128 Int128 0 0
UInt128 UInt128 0 0

bertschi · April 27, 2025, 12:26pm

That certainly sounds like a bug … seems like the line a += stop - start # Signed are allowed to go negative in the length(::AbstractUnitRange) method overflows in this case.

bertschi · April 27, 2025, 12:30pm

Well, yes … but you can easily hide that behind a short function.
Further, implementing bisection from scratch also needs care (see here for a common bug which also applies to the code in this thread).

sgaure · April 27, 2025, 6:37pm

That’s an interesting question. It’s not easily fixable, since length would have to be changed to return either a Int128 or longer, or a BigInt in this case, making it type unstable or breaking. The problem comes from the fact that julia does not use a “mathematical” integer for literals, lengths and sizes, like some languages do (cryptol springs to mind), but an ordinary bit integer.

sgaure · April 27, 2025, 6:51pm

I’ve entered an issue for the segfault from

collect(0:typemax(UInt))

github.com/JuliaLang/julia

Segfault in edge case UnitRange{UInt64}

opened 06:50PM - 27 Apr 25 UTC

sgaure

``` julia> versioninfo() Julia Version 1.11.5 Commit 760b2e5b739 (2025-04-14 06:…53 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 16 × AMD Ryzen 7 2700X Eight-Core Processor WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, znver1) Threads: 16 default, 0 interactive, 8 GC (on 16 virtual cores) Environment: julia> collect(0:typemax(UInt)) [1332714] signal 11 (1): Segmentation fault in expression starting at REPL[2]:1 setindex! at ./array.jl:987 [inlined] Array at ./range.jl:1375 [inlined] Array at ./boot.jl:605 [inlined] collect at ./range.jl:1380 ``` From https://discourse.julialang.org/t/built-in-binary-search-for-sorted-collections/128374/12?u=sgaure

Leo_I · April 27, 2025, 7:36pm

@bertschi Thank you for pointing out the insidious bug in my code: b = (a+c)÷2 must be replaced with b = a+(c-a)÷2 to not overflow. Appreciate that!

@sgaure Thank you for discovering even a segfault from this bug and opening an issue!

sgaure · April 27, 2025, 8:39pm

It transpired in the issue discussion that arithmetic overflow checking was intentionally removed from length in deprecate unsafe_length for length by vtjnash · Pull Request #40382 · JuliaLang/julia · GitHub. The reason being that possibly throwing an error in such an essential function adds complexity which meddles with optimization.

So it’s probably better to work around it. (But the segfault must be fixed anyway).

Jeff_Emanuel · April 28, 2025, 2:26pm

Sorry about that, I intending to just give you a starting point, which is what I meant by “something like”. I probably should have just left the post at a pointer to the documentation. I’m glad you got a working implementation yourself. Sometimes its more expedient to implement an algorithm yourself than it is to find the right incantation of a library function.

Topic		Replies	Views
Computing binomial(n,k) without overflow General Usage	14	475	January 28, 2024
How to use the `by` argument to `searchsortedlast()` General Usage	3	377	April 15, 2022
Bucket sorting in Julia, with functions General Usage question , hpc	1	331	September 13, 2022
Faster default Base.sort! for Julia Performance sort	4	1043	July 29, 2021
Sorted Array implementation? General Usage array , sort	21	7923	February 6, 2024

Built-in binary search for sorted collections

Related topics