ANN: CUDA.jl 3.3

Hi all,

I’ve released CUDA.jl 3.3 on Friday, with several exciting new features. There’s a blog post summarizing those features, as well as some from CUDA.jl 3.1 and 3.2 (for which there wasn’t a blog post): CUDA.jl 3.3 ⋅ JuliaGPU

Key highlights:

  • CuArray support for isbits union element types (useful for nothing, missing)
  • Ability to emit debug and location information for GPU code
  • Support for CUDA’s semantic versioning (so you can use CUDA 11.3 on a driver for 11.0)
  • High-level wrappers for the CUDA graph APIs
21 Likes

How the new graph APIs should be applied to custom kernels?
Something like this?

@captured @cuda threads=Nth blocks=Nbl kernel(A)

And in case if I use kernel configuration, e.g.

ckernel = @cuda launch=false kernel(A)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks =  cld(N, threads)
ckernel(a, b; threads=threads, blocks=blocks)

how should I apply the @captured macro?
Like this

@captured begin
    ckernel = @cuda launch=false kernel(A)
    config = launch_configuration(ckernel.fun)
    threads = min(N, config.threads)
    blocks =  cld(N, threads)
    ckernel(a, b; threads=threads, blocks=blocks)
end

or like this

ckernel = @cuda launch=false kernel(A)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks =  cld(N, threads)
@captured ckernel(a, b; threads=threads, blocks=blocks)

Thank you.

1 Like

The macro doesn’t care, just encapsulate any chunk of code that performs a launch. See the tests, for example: CUDA.jl/graph.jl at 71d5f39daf4ffcb8d104f5a10a26f096b8150695 · JuliaGPU/CUDA.jl · GitHub. Note that graph recording doesn’t support all CUDA APIs, but the occupancy API shouldn’t be a problem (the broadcast example in the blog post uses the occupancy API to determine a launch configuration).

1 Like