Simple recursive Fibonaci example. How to make it faster?

Elrod · December 17, 2019, 10:13pm

You may be familiar with writing functions like this to be generic:

function mysume(v::AbstractVector)
    s = zero(eltype(V))
    @simd for x in v
        s += x
    end
    s
end

This function should be type stable for all reasonable definitions of +. Eg:

xf = rand(100);
@code_warntype mysume(xf)
xi = rand(1:100, 100);
@code_warntype mysume(xi)
using StaticArrays
xm = [@SMatrix randn(2,2) for _ in 1:100];
@code_warntype mysume(xm)

You can also write the function like this:

function mysumw(v::AbstractVector{T}) where {T}
    s = zero(T)
    @simd for x in v
        s += x
    end
    s
end

and you’ll get the same result: it uses T, the element type parameter, to make a zero of the appropriate type.

Julia compiles a specialized version for each element type, one for T === Float64, another for T === Int, and another for T === SMatrix{2,2,Float64,4}.

What if instead of using a type parameter T to give us an element type and let us make a zero of the correct type, we used an actual value as a type parameter?

AbstractArrays sort of all do this, giving us their number of dimensions. Check this out:

julia> twice_ndim(x) = 2length(size(x))
twice_ndim (generic function with 1 method)

julia> @code_typed twice_ndim(rand(3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
└──     return 2
) => Int64

julia> @code_typed twice_ndim(rand(3,3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
│       Base.arraysize(x, 2)::Int64
└──     return 4
) => Int64

julia> @code_typed twice_ndim(rand(3,3,3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
│       Base.arraysize(x, 2)::Int64
│       Base.arraysize(x, 3)::Int64
└──     return 6
) => Int64

The code for twice_ndim doesn’t calculate anything (no 2*): it just returns the answer!

Tamas_Papp was taking advantage of this, by using a Val type. You can make your own via:

struct MyVal{T} end

the trick is instead of using T to describe the struct, like how an Array{Float64,2}'s type parameters say it (a) holds Float64 elements and (b) has 2 dimensions, MyVal’s T can be anything.

But, just like the Array{T,N}'s N let the twice_ndim function compile into simply spitting out the answer:

julia> @code_typed fibval(Val(46)) 
CodeInfo(
1 ─     goto #3 if not false
2 ─     nothing::Nothing
3 ┄     return 2971215073
) => Int64

julia> @btime fibval(Val(46)) 
  0.001 ns (0 allocations: 0 bytes)
2971215073

fibval here isn’t doing anything except returning the answer.

The first run that compiled was also very fast. fib is slow, because normally fib(N) requires O(2^N) work. However, with Tamas_Papp’s trick – which is equivalent to memoization – requires just O(N) work, because each fib(::Val{N}) only gets calculated once, and the result saved into a specialized method.

Compiling takes vastly longer than simple arithmetic. Dynamic dispatches, if the types of everything couldn’t be found at compile time, also take much longer (although a dynamic dispatch takes much less time than compilation).
It will normally be better to let the compiler create a function optimized to do arithmetic on values quickly, than compile a version of the function specific to those values. The exceptions are things like StaticArrays, where some of the values – eg, the size of the arrays – can be used to help the compiler compute on the values of the array more quickly. Other types would include things like special matrix properties, eg, is a matrix positive definite, or maybe lower triangular?

When it comes to values like data (eg, contents of the arrays), it would almost certainly be better to NOT use these values as type parameters. Just write a normal function and assign the results to a variable, or maybe use a memoization library.

tkf · December 18, 2019, 12:20am

This is a natural generalization of the function barrier + mutate-or-widen approach used in functions like collect. So, I think it might be natural to have it in Julia: Tail-call optimization and function-barrier -based accumulation in loops - #8 by tkf

cshen · December 18, 2019, 12:38am

Going back to the multi-threaded version of the algorithm that Mason posted, i’ve been playing with it a bit. I noticed that if running @code_warntype on fib shows that fetch(t) introduces ::Any type into the calculations.

I tried to skim the docs looking for information regarding it but could not find anything notable. Maybe making it type stable would improve the performance a bit (Is it even possible?).
Although I’m not sure how to introduce type stability on fetch, Task type is not convertible to ::Int.

Juan · December 18, 2019, 12:58am

I have also tried a simpler version, without splitting for n>23.

function fib(n::Int64)
  if n <= 1 return 1 end
      t = Threads.@spawn fib(n - 2)
  return return fib(n - 1) + fetch(t)
end

And when you try

fib(46)

it consumes all my computer memory and finally crashes.

cshen · December 18, 2019, 1:11am

The splitting at 23, is to avoid spending more time setting up the @spawn overhead (multi-threading is not for free) than doing the calculations directly (if you check out Mason stack overflow answer, he goes into much more detail).

StefanKarpinski · December 18, 2019, 1:33pm

If you work out how many tasks you spawned, that makes sense.

mwolff · January 12, 2020, 12:43pm

Hello,

what could I do to get the program up and running? I only get error messages like this.

march=pentium4 ? i don’t have this shit old feet warmer from 2000

the module looks like this

module Fib

fib(n, a=0, b=1) = n > 0 ? fib(n-1, b, a+b) : a

Base.@ccallable function julia_main(ARGS::Vector{String})::Cint
    @show fib(10)
    return 0
end

end # module

ufechner7 · January 12, 2020, 4:39pm

Why do you make the function ccallable? Which goal do you want to achieve?

mwolff · January 12, 2020, 6:33pm

For creating binary files with PackageCompilerX. There is a post from mason you see this…

kristoffer.carlsson · January 28, 2020, 5:01pm

As reference, tihs was a bug with PackageCompilerX on 32-bit Julias which has now been fixed (requires gcc 7 though).

mwolff · January 28, 2020, 5:31pm

ok thanks again for your help. gcc is now linked to version 7.4, before it was 5.4. Now it runs, but unfortunately it takes a long time to create. My ThinkPad is a bit too old for this

best regards
Michael

billtubbs · June 21, 2020, 1:57am

Is there a reason they use a recursive Fibonacci algorithm for the benchmark? It’s a very inefficient way to compute it (I compared the Python Fib code used in the benchmark test with a non-recursive method and found it was 2200 times slower). Specifically to test speed on recursive tasks maybe?

John_Gibson · June 21, 2020, 2:06am

Yes. The point of the benchmark is not to optimize the Fibonacci algorithm, it’s to test performance of recursion.

billtubbs · June 21, 2020, 2:07am

That makes sense. Thanks.

xiaodai · June 21, 2020, 3:08am

Feels like Ackerman’s function would be better suited and cause less questions like this.

Elrod · June 22, 2020, 2:34am

function A(m,n)
    if iszero(m)
        n + one(n)
    elseif iszero(n)
        A(m - one(m), one(n))
    else
        A(m - one(m), A(m, n - one(n)))
    end
end

ccode = """
long A(long m, long n){
    long x;
    if (m == 0) {
        x = n + 1;
    } else if (n == 0) {
        x = A(m - 1, 1);
    } else {
        x = A(m - 1, A(m, n - 1));
    }
    return x;
}

""";
open("ackerman.c", "w") do io
    write(io, ccode)
end
run(`gcc -O3 -march=native -shared -fPIC ackerman.c -o libackerman.so`)
const AckerC = joinpath(pwd(), "libackerman.so")
Ac(m::Clong, n::Clong) = @ccall AckerC.A(m::Clong, n::Clong)::Clong

This yields:

julia> @time A(3,3)
  0.000005 seconds
61

julia> @time Ac(3,3)
  0.000003 seconds
61

julia> @time A(4, 1)
  4.537859 seconds
65533

julia> @time Ac(4, 1)
  1.829240 seconds
65533

xiaodai · June 22, 2020, 6:01am

Wow

Per · June 22, 2020, 9:23am

The Ackermann function is interesting in that it actually pays off to compile it separately for each value of m, even when compile time is accounted for.

using StaticNumbers
function A(m,n)
    if iszero(m)
        n + one(n)
    elseif iszero(n)
        A(@stat(m - one(m)), one(n))
    else
        A(@stat(m - one(m)), A(m, n - one(n)))
    end
end
@time A(static(4), 1)

This yields more than a 5x speedup on my system.

braamvandyk · June 27, 2020, 2:56pm

Running this function on my PC (Windows 10, Julia 1.4.2) gives a stack overflow for A(4,1).

Fourier · December 5, 2024, 11:15am

The updated version of this is

using Pkg
Pkg.generate("Fib")

edit the files

 using PackageCompiler
create_app("Fib", "FibCompiled")

Topic		Replies	Views
Recursive Fibonacci Benchmark using top languages on Github Performance recursion	11	2818	September 29, 2018
Yet another TCO thread Performance recursion	5	2379	May 25, 2022
Performance comparison, Julia is a slower than Mojo, am I doing something wrong? Performance question , mojo	44	7815	September 28, 2023
Julia Performance vs. C Performance	8	4139	June 15, 2018
How to speed up this simple code? Multithreading, simd, inbounds Performance	39	6945	January 29, 2019

Simple recursive Fibonaci example. How to make it faster?

Related topics