KernelAbstractions Autotuning

smartalecH · December 24, 2023, 6:48pm

I’m going through the recent paper comparing batched KernelAbstractions kernels to standard array batching in jax and pytorch. In section 5.1.2, I found this interesting tidbit:

KernelAbstractions.jl performs a limited form of auto-tuning by optimizing the launch parameters for occupancy.

I went back to the docs to see if I could find anything describing this, but came up empty handed.

Perhaps I’m looking in the wrong place. Does anyone have any references that describe this functionality (and how well it works across different hardware platforms)?

Thanks!

vchuravy · December 25, 2023, 11:48am

It is backend dependent, but if you don’t specify the workgroupsize the back ends makes an educated guess.

As an example CUDA does CUDA.jl/src/CUDAKernels.jl at 3605167a9ea3aebfc944cc88ea0f86f01723a764 · JuliaGPU/CUDA.jl · GitHub

Topic		Replies	Views
Optimize code which uses KernelAbstractions.jl Performance gpu , kernelabstractions	1	258	February 11, 2024
Several questions about KernelAbstractions GPU gpu , cuda , kernelabstractions	6	1569	January 18, 2022
Optimizing CUDA.jl performance for small array operations GPU performance , cuda	5	2477	February 8, 2021
Using functions in GPU Kernel (via KernelAbstractions.jl) (k nearest neighbor kernel) GPU	1	936	January 25, 2021
How to benchmark a function that uses KernelAbstractions kernels? GPU question , kernelabstractions	4	115	March 17, 2025

KernelAbstractions Autotuning

Related topics