How would I retrieve the result from CUDA.@profile?


Checking out the new release of CUDA.jl, which really has made profiling a lot easier! Much appreciated. I was wondering though, how do I retrieve the result from the profiling?

julia> a = CUDA.rand(1024, 1024, 1024)
julia> CUDA.@profile trace=true CUDA.@sync a .+ a
Profiler ran for 12.29 ms, capturing 527 events.

Host-side activity: calling CUDA APIs took 11.75 ms (95.64% of the trace)
│  ID │     Start │      Time │ Thread │                    Name │
│   5 │   6.91 µs │  13.59 µs │      1 │ cuMemAllocFromPoolAsync │
│   9 │  36.72 µs │ 199.56 µs │      1 │          cuLaunchKernel │
│ 525 │ 510.69 µs │  11.75 ms │      2 │     cuStreamSynchronize │

What I wish is similar to BenchmarkTools, where one can output the timing into a variable, and display it later / manipulate the data etc. I am not sure how to do that with CUDA though?

Perhaps I am missing something obvious :slight_smile:

Kind regards

The reason we don’t have such an API, is that it’s not clear what ‘result from profiling’ should be returned. Is it the kernel times? The API times? The NVTX ranges? Or just the total execution time of the code on the CPU? Or on the GPU? The profiler UI reports all that information.

If you just want to measure the total execution time, you should use Base.@elapsed CUDA.@sync ... for the CPU time, and CUDA.@elapsed ... for the GPU time.

If you want more specific times, there’s CUDA.@profiled now (on CUDA.jl#master), which returns the internal state of the profiler, but the formatting of that isn’t documented. That’s on purpose, so it can change without requiring a breaking release.


Thanks for the explanation!

Perfect, I will see if I can get those to work well for me - no matter what this is a huge step up from before, so still very pleased

On a side note. I am not sure who selected your answer as solution, and while I agree it is and very well explained, it would be nice if I could have the option to select the solution :slight_smile:

Maybe we should return an object that contains the data and uses show to be nicely displayed.

I noticed the other day that the current output to stdout doesn’t look as nicely on Pluto.


That’s a good idea, and would allow unifying @profile and @profiled.

1 Like

Unfortunately, the solution button and the like button are very close to each other and it is very easy to misclick, especially on a mobile phone. So maybe that’s what happened. In any case, you should be able to revoke the solution tag (I think).

1 Like

I implemented this in Profiler: Improve compatibility with Pluto.jl and friends. by maleadt · Pull Request #2139 · JuliaGPU/CUDA.jl · GitHub. CUDA.@profiled is no more; CUDA.@profile now returns a ProfileResult object that displays like before. You can peek into that struct to get the data of the profile, but I’d recommend adding some accessors (e.g. get_kernels(::ProfileResult), or an iterator, etc) based on your use case, and submitting a pull request for that. I’m happy to support accessors like that, but the internal fields can change between releases.


Ah, thank you, I will imagine that that was the case :blush:

I see @maleadt and others have been going on about implementing what I was asking about, so I think it is fair to mark his new answer at the bottom as the answer for now!

Kind regards