Why did the doubly recursive fibonacci microbenchmark become relatively slower?

I pulled out the fib benchmark, using both @btime from BenchmarkTools.jl and the older @timeit macro from the Julia Microbenchmark.

If I use Clang (Apple clang version 17.0.0), which like Julia uses LLVM, then Julia (version 1.12.5, libLLVM-18.1.7) is still about 30% slower than C:

Julia, Julia Cint, and C (via @btime):
  12.750 μs (0 allocations: 0 bytes)
  12.791 μs (0 allocations: 0 bytes)
  9.875 μs (0 allocations: 0 bytes)

Julia, Julia Cint, and C (via @timeit, times in ms):
julia,Julia Int,0.013416
julia,Julia Cint,0.014500
julia,C,0.010834

(this is on an Apple M4).

Code
using Test, BenchmarkTools

fib(n) = n < oftype(n, 2) ? n : fib(n - oftype(n, 1)) + fib(n - oftype(n, 2))

@test fib(20) == 6765
@test fib(Cint(20))::Cint == 6765

using BenchmarkTools

import Libdl
const Clib = "libfib"
Cfib = """
int fib(int n) {
    return n < 2 ? n : fib(n-1) + fib(n-2);
}
"""
open(`cc -fPIC -O3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, Cfib)
end

cfib(n) = @ccall Clib.fib(n::Cint)::Cint

@test cfib(20) == 6765

const NITER = 5
using Printf: @printf
macro timeit(ex, name)
    quote
        t = zeros(NITER)
        for i = 0:NITER
            e = 1000 * (@elapsed $(esc(ex)))
            if i > 0
                t[i] = e  # i == 0 is JIT warmup
            end
        end
        @printf "julia,%s,%f\n" $(esc(name)) minimum(t)
        GC.gc()
    end
end

println("Julia, Julia Cint, and C (via @btime):")
@btime fib(20)
@btime fib(Cint(20))
@btime cfib(20)

println("\nJulia, Julia Cint, and C (via @timeit, times in ms):")
@timeit fib(20) "Julia Int"
@timeit fib(Cint(20)) "Julia Cint"
@timeit cfib(Cint(20)) "C"

By “non-recursive” presumably it is using explicit stack management? That is “fair” — it is still using the same exponential-time algorithm, just via an optimized implementation.

On my machine it is about 1.7× faster than clang if I use gcc 15.2.0 to compile the C code:

Julia, Julia Cint, and C (via @btime):
  12.750 μs (0 allocations: 0 bytes)
  12.750 μs (0 allocations: 0 bytes)
  5.701 μs (0 allocations: 0 bytes)

Julia, Julia Cint, and C (via @timeit, times in ms):
julia,Julia Int,0.014292
julia,Julia Cint,0.015333
julia,C,0.006459

making Julia about 2.2× slower than gcc.

Can anyone replicate this using the code above?

3 Likes