[ANN] cuTile.jl v0.3 + webinar

maleadt · May 5, 2026, 2:04pm

I’ve just tagged cuTile.jl v0.3, featuring:

CUDA.jl integration. Launching a cuTile kernel is now just @cuda backend=cuTile ....
Better performance. We now match or outperform NVIDIA’s cuTile Python on every benchmark we ship.
Much improved latency, with TTFX the same as with regular CUDA.jl kernels (~1.8s for a trivial kernel on my system).
Random number generation, both host-level and in-kernel. Performance matches or beats cuRAND and the new GPUArrays.jl’ generator.
Array slicing. @view A[i:j, :] now produces a sub-range TileArray you can pass to ct.load / ct.store.

Full write-up with code samples and benchmark numbers can be found on juliagpu.org: cuTile.jl 0.3: CUDA.jl integration, and even better performance & latency ⋅ JuliaGPU

Upcoming webinar

If you’d like a guided tour, Andy Terrel (NVIDIA) and I are running a joint webinar on May 12, 2026 at 1 PM ET covering CUDA Tile’s design, how cuTile.jl is built on top of it, and several worked examples. Sign up here: cuTile.jl for High-Performance Computing in Julia - Event - JuliaHub

Topic		Replies	Views
[ANN] cuTile.jl: Tile-based GPU programming for CUDA GPUs Package Announcements gpu , cuda	4	429	March 4, 2026
Block/Tile-Based GPU Programming (not Scratch) GPU gpu , tile , block	3	685	December 8, 2025
CUDAnative is awesome! GPU	12	6071	December 3, 2018
Fast tile search GPU	6	589	November 11, 2022
Julia version/CUDA compatibility with Quadro K4100 compute capbility of 3 GPU	1	541	March 29, 2021

[ANN] cuTile.jl v0.3 + webinar

Upcoming webinar

Related topics