Using LinuxPerf for small functions

LinuxPerf.jl wraps the perf_event_open Linux syscall. But using it for small functions gives ridiculous results. The following example reports over 12000 clock cycles and 2000 memory fetches to compute 1+1:

using LinuxPerf
@measure 1+1

Dumb question: what is running other than the execution of 1+1 for perf_event_open to report so many events? Expression parsing?

@vchuravy :point_up:

First, run @measure twice, and in a function: f() = @measure 1+1; f(); f(). It does not run the expression multiple times and average over runs like BenchmarkTools.@btime does, so the first time you run this, you’re getting compilation time. And running it at the global scope includes some penalty from running code at the toplevel.

Second, this function is far too small to be effectively measured by the perf subsystem. You’re basically doing a syscall, performing 1-2 instructions, and then immediately performing another syscall. Performing each of those syscalls requires a non-negligible number of instructions (which perf will count), both in Julia and in the kernel. You could try running this in a repeated loop for some number of iterations, but you’d still be picking up loop overhead (at least 2 extra instructions) after calculating the average result.

2 Likes

Running @measurement multiple times still gives me thousands of cycles (6000+, though quite variable across runs) and 2270 instructions.

Doing an equivalent benchmark in C gives me ~60 cycles and 13 instructions.

I always define

function foreachf(f::F, N, args::Vararg{Any,A}) where {F,A}
    foreach(_ -> f(args...), 1:N)
end

So that it calls f(args...) a total of N times.
However, you’ll have to make sure the compiler doesn’t defeat the benchmark, like it does for +(::Int,::Int).

1 Like

Can you show the code for this C benchmark?