Choosing between KernelAbstractions, AcceleratedKernels, ParallelStencils, or just CUDA.jl

Hello,

Until today, I have used mainly CUDA.jl to run simulations. I mostly do hydrodynamics and solve various PDEs.

I would like to make my code available for other users in my lab, to be able to run simulations also on CPU or Metal GPU, and to compare performance between GPUs/CPUs and with ParallelStencils.jl, which claims to be faster than my naive CUDA.jl implementation.

The problem is that I am a bit lost in the choice of packages. AcceleratedKernels.jl claims to be faster than a naive use of KernelAbstractions.jl, but I don’t know if it is still maintained today.

ParallelStencils.jl looks nice for my finite difference simulations but is not made for the use of spectral methods. Also, with KernelAbstractions.jl, I can precompile the kernels like I do in CUDA.jl:

# CUDA.jl
f! = @cuda launch=false f_kernel(A, B)
f!(A, B; threads=threads, blocks=blocks)
# KernelAbstractions.jl
f! = f_kernel!(device)
f!(A, B, block_size)

I don’t know if it is possible to do that using ParallelStencils.jl?

What would be the best choice for me? I would prefer to use the same package for all my simulations (based on pseudo-spectral methods or finite differences).

I am mainly solving Navier-Stokes equations using the Lattice Boltzmann Method, Poisson equations using FFTW or iterative schemes, and advection-diffusion equations. Nothing very complicated, so I suppose a package like WaterLily.jl, Trixi.jl, or FiniteVolumeMethod.jl would also make sense? But here again, the number of choices is huge.

Thanks in advance for your help!

Best,

AcceleratedKernels.jl is a package for several common algorithms that uses KernelAbstractions.jl to target the GPU backends, and it is still maintained (it is used by AMDGPU.jl as an example).

Similarly, ParallelStencil.jl and Chmy.jl both use KernelAbstractions underneath, but provide a lot more domain specific tools on top of it.

WaterLily.jl and Trixi.jl sit another abstraction step higher.

Trixi has a Lattice-Boltzman example here Trixi.jl/examples/tree_3d_dgsem/elixir_lbm_taylor_green_vortex.jl at 5178770a1fa5e9e7021b4a77e78d788b565b33cd · trixi-framework/Trixi.jl · GitHub