function fib(n)
if n <= 1 return 1 end
return fib(n - 1) + fib(n - 2)
end
@btime(fib(46))
Julia needs 10.716 seconds.
Nim only 0.429.
C++ 3.856.
Can we do anything to improve the performance without modifying the algorithm?
Declaring types with function fib(n:Int64) gives almost the same results.
using @simd, @inbounds or @fastmath doesn’t seem to be useful here.
Any simple way to make it multithreaded?
julia> Threads.nthreads()
6
julia> function F(n)
if n < 2
return n
else
return F(n-1)+F(n-2)
end
end
F (generic function with 1 method)
julia> function fib(n::Int)
if n > 23
t = Threads.@spawn fib(n - 2)
return fib(n - 1) + fetch(t)
else
return F(n)
end
end
fib (generic function with 1 method)
julia> using BenchmarkTools
julia> @btime fib(46)
1.506 s (795641 allocations: 57.93 MiB)
1836311903
julia> @btime F(46)
8.591 s (0 allocations: 0 bytes)
1836311903
Recursive Fibonacci is one of the multilanguage benchmark comparisons featured at Julia Micro-Benchmarks. In those, Julia run time is 1.5 times C run time, approximately, excluding the first run which includes compilation. Not bad.
Raw times for fib(20)
fortran 0.0225
c 0.0227
lua 0.027
julia 0.0302
rust 0.0392
go 0.0410
javascript 0.08
java 0.0827
matlab 0.4
python 2.143
mathematica 3.002
r 6.0
octave 228.3
No it wouldn’t, you can time it yourself and it’s still much slower.
This conversation Nim forum suggest the Nim people believe their compiler hoisted at least some of the calculation to compile time, which is likely true of the C/C++ compiler as well.
Given that the other compilers may have done it tacitly, here’s me “encouraging” julia to hoist the calculation with Base.@pure and then statically compiling it with PackageCompilerX.jl
[mason@mason-pc ~]$ cat Fib/src/Fib.jl
module Fib
Base.@pure fib(n::Int) = n < 2 ? n : fib(n - 2) + fib(n - 1)
Base.@ccallable function julia_main()::Cint
@show fib(46)
return 0
end
end # module
[mason@mason-pc ~]$ julia --banner=no
julia> using PackageCompilerX; create_app("Fib", "FibCompiled")
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Updating `~/Fib/Project.toml`
[no changes]
[ Info: PackageCompilerX: creating base system image (incremental=false)...
[ Info: PackageCompilerX: creating system image object file, this might take a while...
[ Info: PackageCompilerX: creating system image object file, this might take a while...
[mason@mason-pc ~]$ time FibCompiled/bin/Fib
fib(46) = 1836311903
real 0m0.320s
user 0m0.375s
sys 0m0.501s
Disclaimer:
Base.@pure is known by the state of California to cause cancer and birth defects and I am not condoning it’s (ab)use. Do not use it in anything important unless Jameson tells you it’s okay.
Note that Julia doesn’t do tail call optimizations, while most C and C++ compilers do. Therefore the simple version of the Julia fib function will be slower.
You should be able to replicate the C/C++ speed by manually unrolling one of the recursive calls. That is, replace the two calls with a for loop containing one call. This is what the C/C++ compilers do.
We should probably refactor the benchmarks so that the values to compute come from an external input instead of being in the code since it’s very annoying to convince various compilers not to specialize on constant values, which is not the point of the benchmarks. But man, what a drag. Almost as bad as writing them all in the first place was.
I don’t understand why you are making this comparison, while the actual repository has separate tables for language categories, eg by compilation and type systems. That is the right approach.
Julia 1.1 is the second fastest in its category there by a small margin, so I am not sure that there are many low-hanging fruits. Comparing to C++ (without compilation time, while including it for Julia) is a nonsensical exercise (and, to be clear, the repository doesn’t actually propose doing this).
TLDR: “tail call optimization” is not actually a performance optimization, it is a memory optimization and only necessary for:
Writing recursive algorithms without blowing up the stack;
Using function calls to express finite automata.
The first can be written as efficiently using iteration, while the second can be expressed as efficiently using an explicit finite automaton. So there is no performance reason for TCO, it is an expressiveness feature for guaranteeing that you can express certain things using recursion rather than iteration or state machines without running out of memory.
The compilation time and startup overhead is negligible here and does not account for the speed differences. Without compilation and startup time, the benchmark takes 8.4 seconds on my machine.
A large part of Julia evangelism is focused exactly on comparing Julia to C/++, and many users were convinced to use julia on the premise that it’s nearly as fast as C yet easy to write. It’s only natural to want to know why Julia does so poorly in comparison here.
As noted above, it is probably TCO and/or unrolling.
We can also do all kinds of tricks, eg
function fibval(::Val{N}) where N
if N ≤ 1
1
else
fibval(Val{N - 1}()) + fibval(Val{N - 2}())
end
end
julia> @time fibval(Val(46)) # first run, includes complilation time
0.057667 seconds (166.50 k allocations: 9.972 MiB, 22.78% gc time)
julia> @time fibval(Val(46)) # let's benchmark Base.show
0.000182 seconds (4 allocations: 160 bytes)
These things are fun, but they tell you little about real-life performance of 10K LOC applications, which is what matters in practice — and of course, developer time.
and since julia uses multiple dispatch, when Tamas calls fib_val(Val(46)), julia will end up making new methods for inputs Val(0) through to Val(46), essentially memoizing the calcuation like @DNF’s example, except it’s implicitly storing the outcomes in the type system.
This approach is often useful for hoisting calculations to compile time, but beware doing something like this dynamically because dispatch is very very slow compared to if-else if it can’t be predicted at compile time.