Hi all,
I’ve released CUDA.jl 3.3 on Friday, with several exciting new features. There’s a blog post summarizing those features, as well as some from CUDA.jl 3.1 and 3.2 (for which there wasn’t a blog post): CUDA.jl 3.3 ⋅ JuliaGPU
Key highlights:
- CuArray support for isbits union element types (useful for
nothing
, missing
)
- Ability to emit debug and location information for GPU code
- Support for CUDA’s semantic versioning (so you can use CUDA 11.3 on a driver for 11.0)
- High-level wrappers for the CUDA graph APIs
21 Likes
How the new graph APIs should be applied to custom kernels?
Something like this?
@captured @cuda threads=Nth blocks=Nbl kernel(A)
And in case if I use kernel configuration, e.g.
ckernel = @cuda launch=false kernel(A)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks = cld(N, threads)
ckernel(a, b; threads=threads, blocks=blocks)
how should I apply the @captured
macro?
Like this
@captured begin
ckernel = @cuda launch=false kernel(A)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks = cld(N, threads)
ckernel(a, b; threads=threads, blocks=blocks)
end
or like this
ckernel = @cuda launch=false kernel(A)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks = cld(N, threads)
@captured ckernel(a, b; threads=threads, blocks=blocks)
Thank you.
1 Like
The macro doesn’t care, just encapsulate any chunk of code that performs a launch. See the tests, for example: CUDA.jl/graph.jl at 71d5f39daf4ffcb8d104f5a10a26f096b8150695 · JuliaGPU/CUDA.jl · GitHub. Note that graph recording doesn’t support all CUDA APIs, but the occupancy API shouldn’t be a problem (the broadcast example in the blog post uses the occupancy API to determine a launch configuration).
1 Like