Poor BigInt performance

The function here is not exactly Fibonacci, so -5 to 256 covers only 4 nfib values (1,2,9,44). But it’s still an elephant (maybe younger).

1 Like

Reusing only 2 numbers reduces the computing time by almost 2. That’s the elephant.

Other thing I should’ve also considered is that n-2 and n-1 occurs for most return values, so saving the conversion allocations help much more than saving allocations of a couple return values. We can do that for the 4 values, though this doesn’t elide the allocations of n-big2 or n-big1 of equal values.

Weirdly, the change in the OP to n-2 and n-1 doesn’t improve runtime for me (probably version/system differences), but doing it for the 4 values does by ~2.5x. The number of allocations are more similar, 1.13G > 921.01M > 351.79M. So this particular elephant isn’t so big.

julia> begin
       const big1 = big(1)
       const big2 = big(2)
       const big3 = big(3)
       const big9 = big(9)
       const big4 = big(4)
       const big44 = big(44)
       function nfib4(n) # meaning I check up to n=4
         if n == big1      big1
         elseif n == big2  big2
         elseif n == big3  big9
         elseif n == big4 big44
         else (nfib4(n-big2)+nfib4(n-big1))*n
         end
       end
       end
nfib4 (generic function with 1 method)

julia> @time nfib4(big(40))
 36.879051 seconds (351.79 M allocations: 6.990 GiB, 24.84% gc time)
12306496796223503426182275975073985352177589230680

julia> @time nfib4(big(40))
 32.710493 seconds (351.79 M allocations: 6.990 GiB, 23.03% gc time)
12306496796223503426182275975073985352177589230680

julia> nfib2(n) = n > big2 ? (nfib2(n-big2)+nfib2(n-big1))*n : n
nfib2 (generic function with 1 method)

julia> @time nfib2(big(40))
 91.455750 seconds (921.01 M allocations: 18.299 GiB, 25.44% gc time)
12306496796223503426182275975073985352177589230680

julia> @time nfib2(big(40))
 86.134056 seconds (921.01 M allocations: 18.299 GiB, 27.48% gc time)
12306496796223503426182275975073985352177589230680

julia> nfib(n) = n > 2 ? (nfib(n-2)+nfib(n-1))*n : n
nfib (generic function with 1 method)

julia> @time nfib(big(40))
 89.184372 seconds (1.13 G allocations: 18.299 GiB, 27.74% gc time)
12306496796223503426182275975073985352177589230680

julia> @time nfib(big(40))
 90.009020 seconds (1.13 G allocations: 18.299 GiB, 27.19% gc time)
12306496796223503426182275975073985352177589230680

It isn’t clear to me why the parameter of nfib shouldn’t always be Int (it is never big). Only the return values should be BigInts. For example:

const big1 = big(1)
const big2 = big(2)
const big9 = big(9)
const big44 = big(44)
function nfib5(n) # meaning I check up to n=4
    if n == 1      big1
    elseif n == 2  big2
    elseif n == 3  big9
    elseif n == 4 big44
    else (nfib5(n-2)+nfib5(n-1))*n
    end
end

Well in OP’s functions n::BigInt was necessary to return a BigInt instead of overflowing Int, but you’re right that it can be decoupled. The *n part would convert and allocate a BigInt I think. Avoiding all allocations in n-1 and n-2 helps further on my system:

julia> @time nfib5(40)
 18.999958 seconds (195.44 M allocations: 3.495 GiB, 27.15% gc time)
12306496796223503426182275975073985352177589230680

julia> @time nfib5(40)
 18.529561 seconds (195.44 M allocations: 3.495 GiB, 26.41% gc time)
12306496796223503426182275975073985352177589230680

Only a <5x speedup over the original (but try on your own systems for your numbers), so there’s surely more to it than just interning to save allocations.

1 Like

There is a specialized *(x::BigInt, y::Int) which should be faster, as BigInt multiplication is essentially long multiplication with Int digits.

1 Like

These two checks are not necessary in the hotter path, as the ‘cached’ n=3 and n=4 results preclude these cases (unless directly called).

I mean, checking for n=3 and n=4 should be first.

That’s true, but my thinking was I probably encounter the n==1 cases more often in the recursive call tree. I haven’t tried reversing the cases but I did write the method as indexing a global tuple of the 4 big integers, and the performance didn’t seem to change for me, the checks were that cheap. I just posted the ordered branches because I thought it looked clearer.

are encountered zero times as the recursion stops with n=3,4 but it doesn’t matter much (at least when I timed it)

1 Like

Oh I see now, of course.

I made something similar, but using Union{Int, Vector{Int}}, where for non-small integers Vector{Int} represents the vector of Limbs, and is handled by the low-level MPN layer of GMP (instead of the MPZ layer for BigInt).

But there are a bunch of operations which I didn’t implement yet, so for these I resort to converting to BigInt, which is somewhat slow. It’s essentially a packaged version of this PR: https://github.com/JuliaLang/julia/pull/17015, where some global BigInt are used to avoid allocation costs when converting to BigInt. But CI fails on windows for some reason, I suspect because of that, and nowadays it’s incorrect to use thread-local objects like that as tasks can migrate, so another solution is needed, the most straightforward would be probably to use task-local storage.

It’s generally faster than BigInt, but not always. But here it looks pretty much worth it:

julia> nfib(n) = n > 2 ? (nfib(n-2)+nfib(n-1))*n : n
nfib (generic function with 1 method)

julia> @time x = nfib(big(40))
 35.327526 seconds (1.13 G allocations: 18.299 GiB, 36.83% gc time)
12306496796223503426182275975073985352177589230680

julia> @time x = nfib(XInt(40))
  1.881859 seconds (46.37 k allocations: 3.538 MiB)
12306496796223503426182275975073985352177589230680

My goal was to release this package once I implemented few more operations, but I unfortunately haven’t touched this project for 2.5 years. If there is interest, I could revive it and publish it soon in its current state, maybe some people would like to contribute some of the missing functions? Also, in order for XInt to be reliable, some careful review of the implemented algorithms would be needed.

As for the overhead compared to Int when only small integers are actually used:

julia> @btime nfib(18)
  5.535 μs (0 allocations: 0 bytes)
44750731559645106

julia> @btime nfib(XInt(18))
  46.697 μs (0 allocations: 0 bytes)
44750731559645106

julia> @btime nfib(BigInt(18))
  336.929 μs (28415 allocations: 484.35 KiB)
44750731559645106
9 Likes

That looks great! Definitely publish it (with big warnings that it’s not ready to be used yet).

2 Likes

I’d be very interested in contributing, regardless of the current state. Is this in a repository anywhere?

2 Likes