Several questions about KernelAbstractions

Hello,

I have several questions about KernelAbstractions package:

  1. What is the current status of KernelAbstractions in the Julia GPU ecosystem? Is it being seen as an exploration package to try out some possible directions of heterogeneous CPU/GPU programming, or is it aiming to become a standard for future code development, agnostic to CPU/GPU?

  2. Why the return statements are not permitted in kernel functions? There is no such issue with kernels written with CUDA.jl.

  3. Why one have to be so strict with the kernel event ending? I see wait(event) in every example in the documentation. In the same time, as far as I understand, kernels written with CUDA.jl are also asynchronous, but no one forces to use @sync after every call.

  4. In CUDA.jl there is launch_configuration function which allows to measure an optimal number of threads and blocks to launch a kernel. Is there a similar function in KernelAbstractions?

KA (KernelAbstractions) is currently acting as a more minimal, cross-vendor alternative to writing vendor-specific kernel functions. I don’t think it’s going anywhere but up; it already works well with CUDA and AMDGPU (WIP), and it’s kept well maintained and tested by @vchuravy and users in the HPC space.

My guess is that this causes problematic behavior due to thread divergence, but I’m not clear on the exact reasoning. It might also be related to how KA optimizes code, and that return statements could make that harder if code paths diverge significantly.

It’s not strict, just explicit. You don’t have to call wait(event) just after a kernel is launched; in fact, you don’t need to ever call it if you don’t want to, it’s just an indicator that a kernel is finished, and lets other dependent kernels execute in the order that the user expected. AMDGPU.jl also does this, and it has worked out well.

For CUDAKernels, launch configuration will be calculated automatically if workgroupsize is set to nothing.

2 Likes

return value statements are also not permitted in CUDA.jl (only return or return nothing; I assume the same applies to KA.jl). The reason is that there’s no clear meaning – what if different threads return different values – and it could makie the kernel launch synchronous. CUDA C also does not allow returning values.

1 Like

Thank you.

Are you aware of any other packages similar to KA?

By the way, where I can find the sources of CUDAKernels package? For some reason I do not see them on github.

1 Like

Would it be better to allow return nothing statement in KA for consistency?

It’s a subpackage: https://github.com/JuliaGPU/KernelAbstractions.jl/tree/master/lib/CUDAKernels

Thank you.