Performance Application Programming Interface (PAPI) is a portable and efficient API to access
the performance counters available on modern processors. It attempts to provide a unified set of performance metrics across vendors and platforms. Besides performance counters, it includes components to measure power consumption, software-defined events, task granularity, etc
PAPI includes a component for the linux performance counters, so all those counters are also accessible from PAPI.jl
PAPI.jl adds high-level functions on top providing useful primitives that developers can easily understand and use. As well as, allowing access to some of the low-level primitives from Julia.
The basic interface is provided by two macros ‘profile’ and ‘sample’: both taking a set of events and a function call. ‘profile’ gives a one-shot profile whereas ‘sample’ offers a more detailed insight with multiple samples (useful for statistical testing).
using PAPI function mysum(X::AbstractArray) s = zero(eltype(X)) for x in X s += x end s end X = rand(10_000) stats = @profile mysum(X)
which by default outputs a profile about number of total and branching instructions
BR_INS = 4875055 # 22.0% of all instructions # 588.0 M/sec BR_MSP = 188115 # 4.0% of all branches TOT_INS = 22197305 # 0.839 insn per cycle TOT_CYC = 26453432 # 3.191 Ghz # 1.192 cycles per insn runtime = 8289881 nsecs
A set of events can also be provided, for example giving insight into vectorization
julia> stats = @profile [PAPI.VEC_DP, PAPI.DP_OPS] mysum(X) EventValues: VEC_DP = 1000000 DP_OPS = 1000000 # 1.0x vectorized runtime = 1424735 nsecs
I use it extensively in my high-performance work and hope others might find it useful as well.
This kind of profiling however does not give a complete picture and care needs to be taken when interpreting performance counters.