Simple recursive Fibonaci example. How to make it faster?

You may be familiar with writing functions like this to be generic:

function mysume(v::AbstractVector)
    s = zero(eltype(V))
    @simd for x in v
        s += x
    end
    s
end

This function should be type stable for all reasonable definitions of +. Eg:

xf = rand(100);
@code_warntype mysume(xf)
xi = rand(1:100, 100);
@code_warntype mysume(xi)
using StaticArrays
xm = [@SMatrix randn(2,2) for _ in 1:100];
@code_warntype mysume(xm)

You can also write the function like this:

function mysumw(v::AbstractVector{T}) where {T}
    s = zero(T)
    @simd for x in v
        s += x
    end
    s
end

and you’ll get the same result: it uses T, the element type parameter, to make a zero of the appropriate type.

Julia compiles a specialized version for each element type, one for T === Float64, another for T === Int, and another for T === SMatrix{2,2,Float64,4}.

What if instead of using a type parameter T to give us an element type and let us make a zero of the correct type, we used an actual value as a type parameter?

AbstractArrays sort of all do this, giving us their number of dimensions. Check this out:

julia> twice_ndim(x) = 2length(size(x))
twice_ndim (generic function with 1 method)

julia> @code_typed twice_ndim(rand(3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
└──     return 2
) => Int64

julia> @code_typed twice_ndim(rand(3,3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
│       Base.arraysize(x, 2)::Int64
└──     return 4
) => Int64

julia> @code_typed twice_ndim(rand(3,3,3))
CodeInfo(
1 ─     Base.arraysize(x, 1)::Int64
│       Base.arraysize(x, 2)::Int64
│       Base.arraysize(x, 3)::Int64
└──     return 6
) => Int64

The code for twice_ndim doesn’t calculate anything (no 2*): it just returns the answer!

Tamas_Papp was taking advantage of this, by using a Val type. You can make your own via:

struct MyVal{T} end

the trick is instead of using T to describe the struct, like how an Array{Float64,2}'s type parameters say it (a) holds Float64 elements and (b) has 2 dimensions, MyVal’s T can be anything.

But, just like the Array{T,N}'s N let the twice_ndim function compile into simply spitting out the answer:

julia> @code_typed fibval(Val(46)) 
CodeInfo(
1 ─     goto #3 if not false
2 ─     nothing::Nothing
3 ┄     return 2971215073
) => Int64

julia> @btime fibval(Val(46)) 
  0.001 ns (0 allocations: 0 bytes)
2971215073

fibval here isn’t doing anything except returning the answer.

The first run that compiled was also very fast. fib is slow, because normally fib(N) requires O(2^N) work. However, with Tamas_Papp’s trick – which is equivalent to memoization – requires just O(N) work, because each fib(::Val{N}) only gets calculated once, and the result saved into a specialized method.

Compiling takes vastly longer than simple arithmetic. Dynamic dispatches, if the types of everything couldn’t be found at compile time, also take much longer (although a dynamic dispatch takes much less time than compilation).
It will normally be better to let the compiler create a function optimized to do arithmetic on values quickly, than compile a version of the function specific to those values. The exceptions are things like StaticArrays, where some of the values – eg, the size of the arrays – can be used to help the compiler compute on the values of the array more quickly. Other types would include things like special matrix properties, eg, is a matrix positive definite, or maybe lower triangular?

When it comes to values like data (eg, contents of the arrays), it would almost certainly be better to NOT use these values as type parameters. Just write a normal function and assign the results to a variable, or maybe use a memoization library.

10 Likes

This is a natural generalization of the function barrier + mutate-or-widen approach used in functions like collect. So, I think it might be natural to have it in Julia: Tail-call optimization and function-barrier -based accumulation in loops - #8 by tkf

Going back to the multi-threaded version of the algorithm that Mason posted, i’ve been playing with it a bit. I noticed that if running @code_warntype on fib shows that fetch(t) introduces ::Any type into the calculations.

I tried to skim the docs looking for information regarding it but could not find anything notable. Maybe making it type stable would improve the performance a bit (Is it even possible?).
Although I’m not sure how to introduce type stability on fetch, Task type is not convertible to ::Int.

I have also tried a simpler version, without splitting for n>23.

function fib(n::Int64)
  if n <= 1 return 1 end
      t = Threads.@spawn fib(n - 2)
  return return fib(n - 1) + fetch(t)
end

And when you try

fib(46)

it consumes all my computer memory and finally crashes.

The splitting at 23, is to avoid spending more time setting up the @spawn overhead (multi-threading is not for free) than doing the calculations directly (if you check out Mason stack overflow answer, he goes into much more detail).

1 Like

If you work out how many tasks you spawned, that makes sense.

Hello,

what could I do to get the program up and running? I only get error messages like this.

march=pentium4 ? i don’t have this shit old feet warmer from 2000 :mask: :slight_smile:

the module looks like this

module Fib

fib(n, a=0, b=1) = n > 0 ? fib(n-1, b, a+b) : a

Base.@ccallable function julia_main(ARGS::Vector{String})::Cint
    @show fib(10)
    return 0
end

end # module

Why do you make the function ccallable? Which goal do you want to achieve?

For creating binary files with PackageCompilerX. There is a post from mason you see this…

As reference, tihs was a bug with PackageCompilerX on 32-bit Julias which has now been fixed (requires gcc 7 though).

1 Like

ok thanks again for your help. gcc is now linked to version 7.4, before it was 5.4. Now it runs, but unfortunately it takes a long time to create. My ThinkPad is a bit too old for this :sweat_smile:

best regards
Michael

Is there a reason they use a recursive Fibonacci algorithm for the benchmark? It’s a very inefficient way to compute it (I compared the Python Fib code used in the benchmark test with a non-recursive method and found it was 2200 times slower). Specifically to test speed on recursive tasks maybe?

Yes. The point of the benchmark is not to optimize the Fibonacci algorithm, it’s to test performance of recursion.

3 Likes

That makes sense. Thanks.

2 Likes

Feels like Ackerman’s function would be better suited and cause less questions like this.

6 Likes
function A(m,n)
    if iszero(m)
        n + one(n)
    elseif iszero(n)
        A(m - one(m), one(n))
    else
        A(m - one(m), A(m, n - one(n)))
    end
end

ccode = """
long A(long m, long n){
    long x;
    if (m == 0) {
        x = n + 1;
    } else if (n == 0) {
        x = A(m - 1, 1);
    } else {
        x = A(m - 1, A(m, n - 1));
    }
    return x;
}

""";
open("ackerman.c", "w") do io
    write(io, ccode)
end
run(`gcc -O3 -march=native -shared -fPIC ackerman.c -o libackerman.so`)
const AckerC = joinpath(pwd(), "libackerman.so")
Ac(m::Clong, n::Clong) = @ccall AckerC.A(m::Clong, n::Clong)::Clong

This yields:

julia> @time A(3,3)
  0.000005 seconds
61

julia> @time Ac(3,3)
  0.000003 seconds
61

julia> @time A(4, 1)
  4.537859 seconds
65533

julia> @time Ac(4, 1)
  1.829240 seconds
65533
4 Likes

Wow

The Ackermann function is interesting in that it actually pays off to compile it separately for each value of m, even when compile time is accounted for.

using StaticNumbers
function A(m,n)
    if iszero(m)
        n + one(n)
    elseif iszero(n)
        A(@stat(m - one(m)), one(n))
    else
        A(@stat(m - one(m)), A(m, n - one(n)))
    end
end
@time A(static(4), 1)

This yields more than a 5x speedup on my system.

3 Likes

Running this function on my PC (Windows 10, Julia 1.4.2) gives a stack overflow for A(4,1). :frowning: