Using the Intel VTune Profiler with julia

Hi all,

I’m trying to profile the following piece of Julia code, taken from ProfileView.jl, using the external Intel VTune Profiler. Concretely, the code is

# vtune.jl
function profile_test(n)
    for i = 1:n
        A = randn(100,100,20)
        m = maximum(A)
        Am = mapslices(sum, A; dims=2)
        B = A[:,:,5]
        Bsort = mapslices(sort, B; dims=1)
        b = rand(100)
        C = B.*b
    end
end

profile_test(1)
profile_test(10)

As described in the relevant section of the Julia documentation, I compiled Julia (release-1.3) with USE_INTEL_JITEVENTS=1 in the Make.user file and set the environment variable ENABLE_JITPROFILING=1. Specifically, since I’m on Windows I cross compiled Julia using the recommended Cygwin-to-MinGW path.

To profile the code I fired up VTune and created a simple analysis (inspired by the Python tutorial here):

Inspecting the “Bottom-up” analysis results I see all kinds of low level jl_* stuff but cannot find anything about my vtune.jl file or the included profile_test function:

Is there something else I need to do? Is VTune expected to work with Julia? If yes, to what extend? In the python tutorial (linked above) both the script file runtime.py as well as the function black_scholes could be clearly identified as a bottleneck.

I’d appreciate any comments/hints on how to get better profiling results with VTune + Julia!

Best,
Carsten

4 Likes

Short update, I did the same experiment on a linux machine with similar outcome: no mentioning of my file or function as far as I can see.

Any advances on this topic?

I am experiencing the same issue. Compiled v1.3.1. with USE_INTEL_JITEVENTS:=1 in Make.user and running it in vtune profiler 2020 with

Application: /usr/bin/env
Application Parameters: ENABLE_JITPROFILING=1 /usr/bin/julia intelitttest.jl

I am using the “instrumentation and tracing technology (ITT) API” to profile just the f01() function in

include(joinpath(@__DIR__,"IntelITT.jl"))
using Main.IntelITT

N = 256
A = rand(UInt32,1024*1024*N) .% UInt32(17)

function f01()
  B = cumsum(A)
end

precompile(f01,())

__itt_resume()

f01()

__itt_detach()

But vtune is not able to show the stack

got it working now

the issue was, that the precompiled binary of LLVM is compiled without LLVM_USE_INTEL_JITEVENTS

I found a log file logs/LLVM.log documenting this

CMake Warning:
  Manually-specified variables were not used by the project:

    USE_INTEL_JITEVENTS
    USE_OPROFILE
    USE_PERF

so maybe this is a bug in the provided binary.

Setting USE_BINARYBUILDER=0 (and also USE_SYSTEM_LLVM=0 which might be the default anyways) results in a recompilation of LLVM (takes some time).

1 Like

Thanks for sharing this. Will try it on Windows later today.

Would you mind explaining what the “IntelITT.jl” business is about?

1 Like

If not than it should definitely be mentioned in the docs. Mind opening an issue over on github for discussing this?

There seems to be a library for something called Instrumentation and Tracing Technology APIs (ITT) which gives the ability to pass debug information from an application to the Vtune profiler.

  • you can start an application with vtune in “pause” mode
  • and let the application call __itt_resume() which makes the profiler start to collect data
  • you might pass string-annotations for what-is-going-on to the API which can be used in the profiler to group events in an analysis later-on
  • The application can call __itt_detach() which tells vtune to finish the data collection

I just found out about this myself, tried to hack a Julia binding for the libittnotify library and uploaded it for you on github. Haven’t tested anything more specific, but it seems to work at least for this simple case.

1 Like

Of course :slight_smile:

… Although I am not quite sure where this bug report would belong to. Is it the LLVMBuilder repository or the BinaryBuilder.jl repository or the julia repository?

Do you have any advice which one to choose?

EDIT: Further, as far as I understood, this feature of LLVM is based on the JIT Profiling API and unfortunately the disassembly seems broken. I can open the disassembly section in vtune but it displays the same single assembly instruction in each line. This might be a bug in LLVM not reporting the assembly correctly; or it is not even intended to work; or it is a misconfiguration on my side. So this needs some further observation.

I’m not sure. I’d go with JuliaLang/julia for now.

@giordano Maybe you can point us to the right repo here?

LLVM nowadays is built in Yggdrasil

1 Like

Unfortunately building with USE_BINARYBUILDER=0 in cygwin on Windows fails for me, so I can’t get it to work. I’m wondering whether compiling LLVM with LLVM_USE_INTEL_JITEVENTS would be safe as a default (does it come with a performance penalty or something similar?)

I opened an issue to discuss whether it makes sense to set the flag for the prebuilt binaries.

1 Like

I think this already was the intention, but fails due to a typo of using USE_INTEL_JITEVENTS instead of LLVM_USE_INTEL_JITEVENTS (which is why the pre-build-cmake log contains this warning)

Not adding anything to the discussion. Using Intel VTune with Julia is impressive! Its something I though about recently. I hope to get hands on with some bigger systems soon so could help with this.

Also Yggdrasil was the original Linux distribution on floppy disks. I thought it was long dead. Is the Julia builder related in some way?

No