[ANN] Announcing PAPI.jl

I would like to introduce PAPI.jl, Julia bindings for the PAPI hardware performance counters library.

Performance Application Programming Interface (PAPI) is a portable and efficient API to access
the performance counters available on modern processors. It attempts to provide a unified set of performance metrics across vendors and platforms. Besides performance counters, it includes components to measure power consumption, software-defined events, task granularity, etc
PAPI includes a component for the linux performance counters, so all those counters are also accessible from PAPI.jl

PAPI.jl adds high-level functions on top providing useful primitives that developers can easily understand and use. As well as, allowing access to some of the low-level primitives from Julia.
The basic interface is provided by two macros ‘profile’ and ‘sample’: both taking a set of events and a function call. ‘profile’ gives a one-shot profile whereas ‘sample’ offers a more detailed insight with multiple samples (useful for statistical testing).

Example usage:

using PAPI

function mysum(X::AbstractArray)
    s = zero(eltype(X))
    for x in X
        s += x
    end
    s
end

X = rand(10_000)
stats = @profile mysum(X)

which by default outputs a profile about number of total and branching instructions

  BR_INS = 4875055 # 22.0% of all instructions # 588.0 M/sec 
  BR_MSP = 188115 # 4.0% of all branches
  TOT_INS = 22197305 # 0.839 insn per cycle
  TOT_CYC = 26453432 # 3.191 Ghz # 1.192 cycles per insn
  runtime = 8289881 nsecs

A set of events can also be provided, for example giving insight into vectorization

julia> stats = @profile [PAPI.VEC_DP, PAPI.DP_OPS] mysum(X)
EventValues:
  VEC_DP = 1000000
  DP_OPS = 1000000 # 1.0x vectorized
  runtime = 1424735 nsecs

I use it extensively in my high-performance work and hope others might find it useful as well.
This kind of profiling however does not give a complete picture and care needs to be taken when interpreting performance counters.

URL: GitHub - JuliaPerf/PAPI.jl: PAPI bindings for Julia

26 Likes

Awesome!

Is this the same @profile as Profiling.@profile (hooked in to it somehow)?

Sadly no. I was looking at the (profile) sampling code in Julia, in an attempt to hijack it. But at that point it was not extendable. The idea back then was to trigger a sampling event in Julia on a counter overflow, but it didn’t like me doing that without calling Profile.init and installing the timer.

PAPI.@profile counts the events during a single invocation of the function as opposed to PAPI.@sample which calls it many times. I pretty much using it to see if the optimizations I’m applying are actually able to make a difference (e.g. in the number of cache misses)

I regards to combining it with Profiling.@profile: I’m more inclined to actually using the perf tool outside of Julia. For that you would probably have to build Julia from source as the jitevents for perf aren’t enabled in the binary release :frowning:

How does @profile mysum(X) handle precompilation? Should we do mysum(X) first (so it precompiles) and then @profile mysum(X), or it is fine to do just the later?

Just a suggestion - maybe name it PAPI.@perfcounters or so then? A naming conflict with a very established function in a standard library may confuse people.

3 Likes

PAPI.@profile has a warmup parameter (default=0) and does that many warmup rounds prior to measuring.

It also includes a gcfirst parameter to do a few rounds of garbage collection first.

I just found this: https://github.com/jakebolewski/PAPI.jl. Any connection?

Very outdated package that does something related