Launch_configuration() equivalent for AMDGPU.jl

torrance · August 8, 2022, 8:47am

CUDA has a great feature for sizing threads and blocks, namely launch_configuration(). I rarely manually size my kernel, instead something like:

kernel = @cuda launch=false myfunc(args...)
config = launch_configuration(kernel.fun)
threads = min(N, config.threads)
blocks = cld(N, threads)
kernel(args...; threads, blocks)

It’s almost always very close to optimal, and allows my code to move from device to device without worrying too much about launch parameters.

However, from what I can tell ROCm doesn’t have this, and neither does AMDGPU.jl. As a downstream consequence, neither does KernelAbstractions.jl.

So I guess my question is: how should I be sizing my ROCm kernels in a way that is fairly optimal and will work across a range of difference AMD devices?

jpsamaroo · August 8, 2022, 4:17pm

This is a great question, that I don’t currently have a good answer to. As it stands, HIP has their own occupancy analysis which they use to determine a good launch configuration, but since HIP is C++, we likely can’t access this directly (even though the calls themselves are available to C, the arguments expected are not things we generate in AMDGPU.jl).

We can parse the output from the compiler and use that to determine what our resource usage looks like, and since HIP is open source, we can probably borrow their implementation (HIP is MIT-licensed). I’ve managed to find its source, so someone just needs to port it to AMDGPU.jl.

torrance · August 9, 2022, 3:21am

@jpsamaroo Thanks for your reply Julian - porting that is something I’m happy to take on. I’ll be in touch.

bjarthur · November 9, 2022, 9:00pm

any progress on this? i’m specifically interested in equivalent functionality in KernelAbstractions.jl. there’s a launch_config method defined in each of {CUDA,ROC, oneAPI}Kernels, e.g. KernelAbstractions.jl/CUDAKernels.jl at 68c23beec1b4723d2527d186c55e30e7d6074ba6 · JuliaGPU/KernelAbstractions.jl · GitHub, but those take as input a workgroup size and ndrange.

jpsamaroo · November 10, 2022, 8:32pm

I have a branch for this that I’m working on, I just need to wire up the launch_config stuff and then I’ll publish the PR for testing.

Topic		Replies	Views
CUDA: blockdimensions and launch_configuration New to Julia question	0	177	April 17, 2024
AMDGPU available for Windows? GPU	4	1410	February 18, 2021
How do I make sure that GPU functions use the maximum potential config for performance? GPU	3	318	January 16, 2023
AMDGPU.jl has made such amazing progress over the last year! GPU gpu , amdgpu	16	3392	August 18, 2022
ANN: CUDA.jl 3.3 Package Announcements gpu , cuda	2	890	June 14, 2021

Launch_configuration() equivalent for AMDGPU.jl

Related topics