The function here is not exactly Fibonacci, so -5 to 256 covers only 4 `nfib`

values (1,2,9,44). But it’s still an elephant (maybe younger).

Reusing only 2 numbers reduces the computing time by almost 2. That’s the elephant.

Other thing I should’ve also considered is that `n-2`

and `n-1`

occurs for most return values, so saving the conversion allocations help much more than saving allocations of a couple return values. We can do that for the 4 values, though this doesn’t elide the allocations of `n-big2`

or `n-big1`

of equal values.

Weirdly, the change in the OP to n-2 and n-1 doesn’t improve runtime for me (probably version/system differences), but doing it for the 4 values does by ~2.5x. The number of allocations are more similar, 1.13G > 921.01M > 351.79M. So this particular elephant isn’t so big.

```
julia> begin
const big1 = big(1)
const big2 = big(2)
const big3 = big(3)
const big9 = big(9)
const big4 = big(4)
const big44 = big(44)
function nfib4(n) # meaning I check up to n=4
if n == big1 big1
elseif n == big2 big2
elseif n == big3 big9
elseif n == big4 big44
else (nfib4(n-big2)+nfib4(n-big1))*n
end
end
end
nfib4 (generic function with 1 method)
julia> @time nfib4(big(40))
36.879051 seconds (351.79 M allocations: 6.990 GiB, 24.84% gc time)
12306496796223503426182275975073985352177589230680
julia> @time nfib4(big(40))
32.710493 seconds (351.79 M allocations: 6.990 GiB, 23.03% gc time)
12306496796223503426182275975073985352177589230680
julia> nfib2(n) = n > big2 ? (nfib2(n-big2)+nfib2(n-big1))*n : n
nfib2 (generic function with 1 method)
julia> @time nfib2(big(40))
91.455750 seconds (921.01 M allocations: 18.299 GiB, 25.44% gc time)
12306496796223503426182275975073985352177589230680
julia> @time nfib2(big(40))
86.134056 seconds (921.01 M allocations: 18.299 GiB, 27.48% gc time)
12306496796223503426182275975073985352177589230680
julia> nfib(n) = n > 2 ? (nfib(n-2)+nfib(n-1))*n : n
nfib (generic function with 1 method)
julia> @time nfib(big(40))
89.184372 seconds (1.13 G allocations: 18.299 GiB, 27.74% gc time)
12306496796223503426182275975073985352177589230680
julia> @time nfib(big(40))
90.009020 seconds (1.13 G allocations: 18.299 GiB, 27.19% gc time)
12306496796223503426182275975073985352177589230680
```

It isn’t clear to me why the parameter of `nfib`

shouldn’t always be Int (it is never big). Only the return values should be BigInts. For example:

```
const big1 = big(1)
const big2 = big(2)
const big9 = big(9)
const big44 = big(44)
function nfib5(n) # meaning I check up to n=4
if n == 1 big1
elseif n == 2 big2
elseif n == 3 big9
elseif n == 4 big44
else (nfib5(n-2)+nfib5(n-1))*n
end
end
```

Well in OP’s functions `n::BigInt`

was necessary to return a BigInt instead of overflowing Int, but you’re right that it can be decoupled. The `*n`

part would convert and allocate a BigInt I think. Avoiding all allocations in `n-1`

and `n-2`

helps further on my system:

```
julia> @time nfib5(40)
18.999958 seconds (195.44 M allocations: 3.495 GiB, 27.15% gc time)
12306496796223503426182275975073985352177589230680
julia> @time nfib5(40)
18.529561 seconds (195.44 M allocations: 3.495 GiB, 26.41% gc time)
12306496796223503426182275975073985352177589230680
```

Only a <5x speedup over the original (but try on your own systems for your numbers), so there’s surely more to it than just interning to save allocations.

There is a specialized `*(x::BigInt, y::Int)`

which should be faster, as BigInt multiplication is essentially long multiplication with Int digits.

These two checks are not necessary in the hotter path, as the ‘cached’ `n=3`

and `n=4`

results preclude these cases (unless directly called).

I mean, checking for `n=3`

and `n=4`

should be first.

That’s true, but my thinking was I probably encounter the `n==1`

cases more often in the recursive call tree. I haven’t tried reversing the cases but I did write the method as indexing a global tuple of the 4 big integers, and the performance didn’t seem to change for me, the checks were that cheap. I just posted the ordered branches because I thought it looked clearer.

are encountered zero times as the recursion stops with `n=3,4`

but it doesn’t matter much (at least when I timed it)

Oh I see now, of course.

I made something similar, but using `Union{Int, Vector{Int}}`

, where for non-small integers `Vector{Int}`

represents the vector of Limbs, and is handled by the low-level MPN layer of GMP (instead of the MPZ layer for `BigInt`

).

But there are a bunch of operations which I didn’t implement yet, so for these I resort to converting to `BigInt`

, which is somewhat slow. It’s essentially a packaged version of this PR: https://github.com/JuliaLang/julia/pull/17015, where some global `BigInt`

are used to avoid allocation costs when converting to `BigInt`

. But CI fails on windows for some reason, I suspect because of that, and nowadays it’s incorrect to use thread-local objects like that as tasks can migrate, so another solution is needed, the most straightforward would be probably to use task-local storage.

It’s generally faster than `BigInt`

, but not always. But here it looks pretty much worth it:

```
julia> nfib(n) = n > 2 ? (nfib(n-2)+nfib(n-1))*n : n
nfib (generic function with 1 method)
julia> @time x = nfib(big(40))
35.327526 seconds (1.13 G allocations: 18.299 GiB, 36.83% gc time)
12306496796223503426182275975073985352177589230680
julia> @time x = nfib(XInt(40))
1.881859 seconds (46.37 k allocations: 3.538 MiB)
12306496796223503426182275975073985352177589230680
```

My goal was to release this package once I implemented few more operations, but I unfortunately haven’t touched this project for 2.5 years. If there is interest, I could revive it and publish it soon in its current state, maybe some people would like to contribute some of the missing functions? Also, in order for `XInt`

to be reliable, some careful review of the implemented algorithms would be needed.

As for the overhead compared to `Int`

when only small integers are actually used:

```
julia> @btime nfib(18)
5.535 μs (0 allocations: 0 bytes)
44750731559645106
julia> @btime nfib(XInt(18))
46.697 μs (0 allocations: 0 bytes)
44750731559645106
julia> @btime nfib(BigInt(18))
336.929 μs (28415 allocations: 484.35 KiB)
44750731559645106
```

That looks great! Definitely publish it (with big warnings that it’s not ready to be used yet).

I’d be very interested in contributing, regardless of the current state. Is this in a repository anywhere?