Optimize code which uses KernelAbstractions.jl


in RadonKA.jl I just launch the kernel with the following code:

    kernel! = radon_kernel!(backend)
    kernel!(sinogram::AbstractArray{T}, img, weights, in_height, 
            out_height, angles, mid, radius, absorb_f,
    return sinogram::typeof(img)

@kernel function radon_kernel!(sinogram::AbstractArray{T}, img::AbstractArray{T}, 
                               weights, in_height, out_height, angles, mid,
                               radius, absorb_f) where {T}
    i, iangle, i_z = @index(Global, NTuple)

I was wondering, because in the KA docs the groupsize is mentioned.
Should I care about it? And which reasonable value do I choose?
My arrays range from sizes like (256,256) to 3D arrays such as (512,512,512).

I also tried annotating @Const all arguments except sinogram. Didn’t improve performance.
Is there any other free performance tricks I can use?



KA uses a limited form of auto-tuning to select the group size. I would recommend the native performance tools from CUDA to look at kernel performance.

1 Like