Benchmarking / profiling cache use

foobar_lv2 · November 23, 2017, 1:06am

Hi,
I wanted to profile cache misses on some code of mine. I appear to be incapable of eyeballing the hardware prefetcher, and wanted to see whether some explicit prefetches improve my code. These are kinda accessible via something like:

 #this is for read-data that is needed soon.
 #maybe experiment with different locality values?
 #prefetch on instruction cache crashes julia during compilation. meh, not needed anyway.
@inline function prefetch(address)
   Base.llvmcall(("declare void @llvm.prefetch(i8* , i32 , i32 , i32 )",
 "call void @llvm.prefetch(i8 * %0, i32 0, i32 0, i32 1)
 ret void"), Void, Tuple{Ptr{Int8}}, convert(Ptr{Int8},address)) 
end

So, how do you people measure e.g. L1 misses in julia code?

Should I try static-julia and (linux-) perf?

If this is indeed the only reasonable way, should I then write my sample code with a @Base.ccallable ju_main() function, compile into shared library and write a tiny piece of C code that calls the library’s ju_main? Can I use static-julia to build with debug symbols?

PS. My code does a graph traversal; links are offsets into a fixed array. This is should be very bad for the cache. But I need proper tests before I should consider fancy memory layouts; and a couple explicit prefetches are a much lower effort fix than cache-oblivious stuff.

kristoffer.carlsson · November 23, 2017, 9:02am

Perhaps GitHub - carnaval/LinuxPerf.jl could be of use.

foobar_lv2 · November 23, 2017, 12:17pm

Thank you! Google completely failed me there.

This tool is absolutely fantastic; just kindly asking the kernel is obviously much better than a rube-goldberg construction to pass compiled julia into command line tools.

At some point there should be a sticky post / wiki describing these tricks. Carnaval’s IACA.jl also looks extremely interesting.

kristoffer.carlsson · November 23, 2017, 12:23pm

I think this is a continued version of Carnavals IACA: https://github.com/vchuravy/IACA.jl

system · November 24, 2017, 12:23pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to prefetch memory in Julia? Performance	13	1168	October 15, 2022
How to measure cache misses in Julia? Performance	11	3590	February 18, 2019
Can the julia compiler optimize code for the processor cache? Maker question	2	650	March 23, 2020
Profile segfault Performance	5	587	September 16, 2018
Is there a line profiler New to Julia	5	1565	May 15, 2019

Benchmarking / profiling cache use

Related topics