I haven’t looked at Triton in detail, but it looks like a nice DSL with great performance characteristics. Especially matching CUBLAS on GEMMM, that’s not an easy feat. With CUDA.jl etc we focus on the lower-level programming experience, exposing all of the hardware’s capabilities, and a DSL like Triton could be built on top of that. KernelAbstractions.jl is pretty much that, and could probably learn from the abstractions and techniques used in Triton.
5 Likes
Sounds just right