Simple recursive Fibonaci example. How to make it faster?

Juan · December 17, 2019, 2:59am

I’ve found this benchmark comparing several languages with a simple Fibonaci code.

function fib(n)
  if n <= 1 return 1 end
  return fib(n - 1) + fib(n - 2)
end

@btime(fib(46))

Julia needs 10.716 seconds.
Nim only 0.429.
C++ 3.856.

Can we do anything to improve the performance without modifying the algorithm?
Declaring types with function fib(n:Int64) gives almost the same results.
using @simd, @inbounds or @fastmath doesn’t seem to be useful here.
Any simple way to make it multithreaded?

Oscar_Smith · December 17, 2019, 3:07am

That is that this is measuring compilation time. If you, for example, ran fib(1) first, Julia would be similar speed to c.

Mason · December 17, 2019, 3:53am

I wrote this answer: multithreading - Multi-threaded parallelism performance problem with Fibonacci sequence in Julia (1.3) - Stack Overflow a couple weeks ago showing how to use the new multithreading system to get substantial speedups in the recursive Fibonacci sequence. Not sure if this counts for this challenge but here you go:

julia> Threads.nthreads()
6

julia> function F(n)
           if n < 2
               return n
           else
               return F(n-1)+F(n-2)
           end
       end
F (generic function with 1 method)

julia> function fib(n::Int)
           if n > 23
              t = Threads.@spawn fib(n - 2)
              return fib(n - 1) + fetch(t)
           else
              return F(n)
           end
       end
fib (generic function with 1 method)


julia> using BenchmarkTools

julia> @btime fib(46)
  1.506 s (795641 allocations: 57.93 MiB)
1836311903

julia> @btime F(46)
  8.591 s (0 allocations: 0 bytes)
1836311903

John_Gibson · December 17, 2019, 4:05am

Recursive Fibonacci is one of the multilanguage benchmark comparisons featured at Julia Micro-Benchmarks. In those, Julia run time is 1.5 times C run time, approximately, excluding the first run which includes compilation. Not bad.

Raw times for fib(20)

fortran     0.0225
c           0.0227
lua         0.027
julia       0.0302
rust        0.0392
go          0.0410
javascript  0.08
java        0.0827
matlab      0.4
python      2.143
mathematica 3.002
r           6.0
octave    228.3

Mason · December 17, 2019, 6:28am

No it wouldn’t, you can time it yourself and it’s still much slower.

This conversation Nim forum suggest the Nim people believe their compiler hoisted at least some of the calculation to compile time, which is likely true of the C/C++ compiler as well.

Given that the other compilers may have done it tacitly, here’s me “encouraging” julia to hoist the calculation with Base.@pure and then statically compiling it with PackageCompilerX.jl

[mason@mason-pc ~]$ cat Fib/src/Fib.jl
module Fib

Base.@pure fib(n::Int) = n < 2 ? n : fib(n - 2) + fib(n - 1)

Base.@ccallable function julia_main()::Cint
    @show fib(46)
    return 0
end

end # module

[mason@mason-pc ~]$ julia --banner=no
julia> using PackageCompilerX; create_app("Fib", "FibCompiled")
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `~/Fib/Project.toml`
 [no changes]
[ Info: PackageCompilerX: creating base system image (incremental=false)...
[ Info: PackageCompilerX: creating system image object file, this might take a while...
[ Info: PackageCompilerX: creating system image object file, this might take a while...

[mason@mason-pc ~]$ time FibCompiled/bin/Fib
fib(46) = 1836311903

real    0m0.320s
user    0m0.375s
sys     0m0.501s

Disclaimer:

Base.@pure is known by the state of California to cause cancer and birth defects and I am not condoning it’s (ab)use. Do not use it in anything important unless Jameson tells you it’s okay.

DNF · December 17, 2019, 8:45am

Memoization helps a lot here. But is it then the same algorithm? Probably not.

julia> using Memoize

julia> @memoize function fibmem(n)
         if n <= 1 return 1 end
         return fibmem(n - 1) + fibmem(n - 2)
       end

julia> fibmem(1)  # compile
1

julia> @time fibmem(46)
  0.000031 seconds (222 allocations: 5.281 KiB)
2971215073

You have to use @time here, not @btime, since in the latter case, it will just look up the old answer when iterating:

julia> @btime fibmem(46)
  92.659 ns (0 allocations: 0 bytes)
2971215073

Elrod · December 17, 2019, 9:39am

Note that Julia doesn’t do tail call optimizations, while most C and C++ compilers do. Therefore the simple version of the Julia fib function will be slower.

You should be able to replicate the C/C++ speed by manually unrolling one of the recursive calls. That is, replace the two calls with a for loop containing one call. This is what the C/C++ compilers do.

StefanKarpinski · December 17, 2019, 10:21am

We should probably refactor the benchmarks so that the values to compute come from an external input instead of being in the code since it’s very annoying to convince various compilers not to specialize on constant values, which is not the point of the benchmarks. But man, what a drag. Almost as bad as writing them all in the first place was.

Juan · December 17, 2019, 11:45am

I already knew that and run it several times, and the results of that second benchmark corresponds to the alleged benchmarks.

Juan · December 17, 2019, 11:54am

I guess other could think this is cheating becacuse you are not really benchmarking what Julia is able to do but what the programmer does.

Will Julia offer tail call optimization in future?

GunnarFarneback · December 17, 2019, 12:15pm

Possibly, to some extent, but not very likely unless someone steps up and implements it. Read the full story at https://github.com/JuliaLang/julia/issues/4964.

Tamas_Papp · December 17, 2019, 1:16pm

I don’t understand why you are making this comparison, while the actual repository has separate tables for language categories, eg by compilation and type systems. That is the right approach.

Julia 1.1 is the second fastest in its category there by a small margin, so I am not sure that there are many low-hanging fruits. Comparing to C++ (without compilation time, while including it for Julia) is a nonsensical exercise (and, to be clear, the repository doesn’t actually propose doing this).

StefanKarpinski · December 17, 2019, 3:05pm

TLDR: “tail call optimization” is not actually a performance optimization, it is a memory optimization and only necessary for:

Writing recursive algorithms without blowing up the stack;
Using function calls to express finite automata.

The first can be written as efficiently using iteration, while the second can be expressed as efficiently using an explicit finite automaton. So there is no performance reason for TCO, it is an expressiveness feature for guaranteeing that you can express certain things using recursion rather than iteration or state machines without running out of memory.

Mason · December 17, 2019, 3:06pm

The compilation time and startup overhead is negligible here and does not account for the speed differences. Without compilation and startup time, the benchmark takes 8.4 seconds on my machine.

A large part of Julia evangelism is focused exactly on comparing Julia to C/++, and many users were convinced to use julia on the premise that it’s nearly as fast as C yet easy to write. It’s only natural to want to know why Julia does so poorly in comparison here.

Tamas_Papp · December 17, 2019, 3:42pm

As noted above, it is probably TCO and/or unrolling.

We can also do all kinds of tricks, eg

function fibval(::Val{N}) where N
    if N ≤ 1
        1
    else
        fibval(Val{N - 1}()) + fibval(Val{N - 2}())
    end
end

julia> @time fibval(Val(46)) # first run, includes complilation time
  0.057667 seconds (166.50 k allocations: 9.972 MiB, 22.78% gc time)

julia> @time fibval(Val(46)) # let's benchmark Base.show
  0.000182 seconds (4 allocations: 160 bytes)

These things are fun, but they tell you little about real-life performance of 10K LOC applications, which is what matters in practice — and of course, developer time.

Mason · December 17, 2019, 4:12pm

Here’s how you can semi-manually get the tail call elimination.

julia> fib(n, a=0, b=1) = n > 0 ? fib(n-1, b, a+b) : a
fib (generic function with 3 methods)

julia> @btime fib($46)
  64.136 ns (0 allocations: 0 bytes)
1836311903

and the version compiled to an app:

[mason@mason-pc ~]$ cat Fib/src/Fib.jl
module Fib

fib(n, a=0, b=1) = n > 0 ? fib(n-1, b, a+b) : a

Base.@ccallable function julia_main()::Cint
    @show fib(46)
    return 0
end

end # module

[mason@mason-pc ~]$ julia -e 'using PackageCompilerX; create_app("Fib", "FibCompiled")'
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `~/Fib/Project.toml`
 [no changes]
[ Info: PackageCompilerX: creating base system image (incremental=false)...
[ Info: PackageCompilerX: creating system image object file, this might take a while...
[ Info: PackageCompilerX: creating system image object file, this might take a while...

[mason@mason-pc ~]$ time FibCompiled/bin/Fib
fib(46) = 1836311903

real    0m0.330s
user    0m0.384s
sys     0m0.505s

but again, this is cheating since it’s not the algorithm asked for, even if other languages might have done something similar under the hood.

Juan · December 17, 2019, 4:39pm

Could you explain fibval(::Val{N}) where N for dummies (me), please?

Tamas_Papp · December 17, 2019, 4:40pm

https://docs.julialang.org/en/v1/manual/types/#"Value-types"-1

Mason · December 17, 2019, 4:46pm

Val is just a parametric type that can values composed of bits like numbers in it’s parameter, i.e.

julia> Val(10)
Val{10}()

julia> typeof(Val(10))
Val{10}

and since julia uses multiple dispatch, when Tamas calls fib_val(Val(46)), julia will end up making new methods for inputs Val(0) through to Val(46), essentially memoizing the calcuation like @DNF’s example, except it’s implicitly storing the outcomes in the type system.

This approach is often useful for hoisting calculations to compile time, but beware doing something like this dynamically because dispatch is very very slow compared to if-else if it can’t be predicted at compile time.

Juan · December 17, 2019, 7:40pm

What about where N?
What type of variable is defining?

Could you also give an example of using Val that doesn’t work or is much slower?

Topic		Replies	Views
Recursive Fibonacci Benchmark using top languages on Github Performance recursion	11	2805	September 29, 2018
Yet another TCO thread Performance recursion	5	2367	May 25, 2022
Performance comparison, Julia is a slower than Mojo, am I doing something wrong? Performance question , mojo	44	7739	September 28, 2023
Julia Performance vs. C Performance	8	4110	June 15, 2018
How to speed up this simple code? Multithreading, simd, inbounds Performance	39	6925	January 29, 2019

Simple recursive Fibonaci example. How to make it faster?

Related topics