Built-in binary search for sorted collections

Given integers N and k, I’d like to elegantly find the largest integer n for which binomial(n,k) <= N. A way to do this is

function max_n_binom_leq(N::Integer, k::Integer) 
    return maximum(n for n=k:N if binomial(big(n),k) ≤ N) 
end

This is wasteful. because it has complexity O(N) instead of O(log N). The function n -> binomial(big(n),k) is strictly increasing. A better way is to do bisection (binary search). But I wish to avoid implementing it on my own. Does Julia have some built-in methods to solve this elegantly? I would especially like to avoid allocating the whole array k:N, since N might be very large, like typemax(UInt128).

searchsortedlast

Something like searchsortedlast(k:N,N,by=n->binomial(big(n),k)) + (k-1)

You don’t really have to search much to find this. Even with a very crude approximation like

binomial(n, k) = n * (n - 1) * ... * (n - (k - 1)) / k! ≈ (n - (k - 1) / 2)^k / k!

you can solve for n in closed form and end up somewhere close. With better approximations you can probably get spot on.

@Jeff_Emanuel Your suggestion gives an incorrect answer:

julia> N, k = typemax(UInt16), 10
julia> max_n_binom_leq(N,k)
18
julia> searchsortedlast(k:N,N,by=n->binomial(big(n),k)) + (k-1)
65535

@GunnarFarneback Maybe in my particular case, an approximation solves the problem, but not in general. I’ve decided to write my own function:

function sortedfindfirst(x::AbstractVector{te}, f::F) ::te  where {te, F<:Function}
    a = BigInt(1)
    c = BigInt(hasmethod(step,(typeof(x),)) ? floor((big(last(x)) - big(first(x))) / big(step(x))) + 1 : length(x))  # Range vs Vector
    @assert c≥1
    @assert isa(f(x[c]), Bool) && f(x[c])
    while a+1 < c   # bisection loop
        b = a+(c-a)÷2  # equivalent to (a+c)÷2 but does not overflow
        (a,c) = !f(x[a]) && f(x[b]) ? (a,b) : (b,c)
    end
    return f(x[a]) ? x[a] : x[c] 
end

For example, to numerically solve the transcendental equation x=log(x)^2, we’d just use:

julia> @time sortedfindfirst(0:1e-10:1, x -> x>log(x)^2)
  0.027249 seconds (28.80 k allocations: 1.460 MiB, 99.59% compilation time)
0.4948664146
1 Like

That seems a bit more convoluted than necessary. You can index ranges just like a Vector and using BigInt as indices should not be necessary.
So I think you could just write:

function sortedfindfirst(f::F, x::AbstractVector{te}) ::te  where {te, F<:Function}
    a = firstindex(x)
    c = lastindex(x)
    @assert c≥1
    @assert isa(f(x[c]), Bool) && f(x[c])
    while a+1 < c   # bisection loop
        b = (a+c)÷2
        (a,c) = !f(x[a]) && f(x[b]) ? (a,b) : (b,c)
    end
    return f(x[a]) ? x[a] : x[c] 
end

I also put the function argument first as is convention in Julia.
(I am on mobile and did not run this code)

1 Like

The problem with searchsortedlast is documented here:

help?> searchsortedlast
...
 Note that the by function is applied to the searched value x as well as the values in v.

It can be made to work though, by applying the function to the values only, i.e.,

let f(n) = binomial(big(n),k); (k:N)[searchsortedlast(f.(k:N), N)] end

which unnecessarily evaluates binomal for all input values though. Yet, we can delay evaluation and use the by argument to force the values needed:

macro delay(expr) :(() -> $(esc(expr))) end
force(x) = x()
let f(n) = @delay binomial(big(n),k); (k:N)[searchsortedlast(f.(k:N), @delay N; by = force)] end
1 Like

That seems a bit more convoluted than necessary. You can index ranges just like a Vector and using BigInt as indices should not be necessary.

I would prefer to avoid a convoluted solution, yes, but there are problems:

julia> N=typemax(UInt64);  x = 0:N;  typeof(lastindex(x)), lastindex(x), length(x)
(UInt64, 0x0000000000000000, 0x0000000000000000)

@abraemer Any advice on how to handle those cases?

@bertschi Thanks, but that sounds way too complicated for just a mere bisection use.

1 Like

Oof, nice edge case :sweat_smile:
Will searching all values of a number type be a common case for you?
Anyhow for this I case I would create another method:

function sortedfindfirst(f::F, ::T) ::te  where {T::Type{<:Integer}, F<:Function}
    a = typemin(T)
    c = typemax(T)
    I = oneunit(T)
    I2 = I+I
    @assert isa(f(c), Bool) && f(c)
    while a+I < c   # bisection loop
        b =  (a+c)÷I2
        (a,c) = !f(a) && f(b) ? (a,b) : (b,c)
    end
    return f(a) ? a : c
end

Again untested because I am on mobile.
For Floats you’d need a slightly different code.

1 Like

Hmm, doubling the amount of code just for this case is ugly to me. Besides, it seems somehow wrong to me that lastindex and lengthof 0:typemax(UInt)) is 0 even though iterating through this range is nonempty. Could this be considered a bug in Base Julia?

The casting seems inconsistent to me:

julia> for te∈(Int8,UInt8,Int16,UInt16,Int32,UInt32,Int64,UInt64,Int128,UInt128)  
       x = typemin(te):typemax(te);  idx=lastindex(x); len=length(x); println(typeof(idx)," ",typeof(len)," ",big(idx)," ",big(len)) end
Int64 Int64 256 256
Int64 Int64 256 256
Int64 Int64 65536 65536
Int64 Int64 65536 65536
Int64 Int64 4294967296 4294967296
Int64 Int64 4294967296 4294967296
Int64 Int64 0 0
UInt64 UInt64 0 0
Int128 Int128 0 0
UInt128 UInt128 0 0

That certainly sounds like a bug … seems like the line a += stop - start # Signed are allowed to go negative in the length(::AbstractUnitRange) method overflows in this case.

Well, yes … but you can easily hide that behind a short function.
Further, implementing bisection from scratch also needs care (see here for a common bug which also applies to the code in this thread).

1 Like

That’s an interesting question. It’s not easily fixable, since length would have to be changed to return either a Int128 or longer, or a BigInt in this case, making it type unstable or breaking. The problem comes from the fact that julia does not use a “mathematical” integer for literals, lengths and sizes, like some languages do (cryptol springs to mind), but an ordinary bit integer.

I’ve entered an issue for the segfault from

collect(0:typemax(UInt))
1 Like

@bertschi Thank you for pointing out the insidious bug in my code: b = (a+c)÷2 must be replaced with b = a+(c-a)÷2 to not overflow. Appreciate that!

@sgaure Thank you for discovering even a segfault from this bug and opening an issue!

It transpired in the issue discussion that arithmetic overflow checking was intentionally removed from length in deprecate unsafe_length for length by vtjnash · Pull Request #40382 · JuliaLang/julia · GitHub. The reason being that possibly throwing an error in such an essential function adds complexity which meddles with optimization.

So it’s probably better to work around it. (But the segfault must be fixed anyway).

1 Like

Sorry about that, I intending to just give you a starting point, which is what I meant by “something like”. I probably should have just left the post at a pointer to the documentation. I’m glad you got a working implementation yourself. Sometimes its more expedient to implement an algorithm yourself than it is to find the right incantation of a library function.