I would like to catch up with what has been accomplished, which sorts of capability are well supported and where Julia’s GPU adventure goes next.
From the native codegen front: I’ve been working on (1) gradually improving both Julia language and CUDA framework/hardware support, and more importantly (2) to be able to use this support on top of julia/master. This work is nearing completion, and I hope/expect a technical preview to be available with 0.6. I’ll be writing up a blogpost documenting all this in the near future.
For a small demo what’s possible, see eg. this example of a parallel reduction. This is still low-level parallel programming, but that’s the goal. Other packages can use this support to create GPU abstractions, eg. GPUArrays.jl.
For me personally the most exciting development has been the Julia → NVPTX compiler that @maleadt and me are working on at CUDAnative
and the changes around that.
Also quite exciting is the work by @sdanisch on GLVisualize
and the experimental work on GPUArrays
.
Sadly OpenCL.jl
and the bindings to CUDA have seem a bit of a stagnation period, due to lack of time.
However, I can say that the CUDArt.jl bindings work just fine and have a computer which has ran pretty much constantly over the last few months using it to churn away on a problem. If you’re willing to write the CUDA kernals, all seems well. I can’t wait for the convenience features though!
There’s a lot of fragmentation & duplication though (esp. between CUDAdrv and CUDArt), often incompatible with each other, making it harder than necessary to use our wrappers. The fact that there’s 2 duplicate but distinct APIs doesn’t help this fact, of course.
I hadn’t been aware there were two sets of drivers - only used CUDArt, which worked OK for me. Is there any key difference from CUDAdrv?
What kind of ops will CUDAnative support? E.g. would (Y)->sum(Y.^2, 1)
give a vector of sums of squares? Or will it support only ops that map easily to CUDA libraries?
- CUDArt vs CUDAdrv: you can thank NVIDIA for this, as they expose two, mostly identical APIs, for historical reasons. The slightly higher-level runtime one is even built on top of the lower-level driver API. We could unify this at the Julia package level (ie. merging CUDArt.jl and CUDAdrv.jl), but that would take quite some engineering effort and might break some use cases (eg. some CUDA-mimicking libraries, like rCUDA, only support the runtime API).
I’ve recently been working on CUDAdrv because I need some of the low-level driver API calls for CUDAnative. I think it has some more solid foundations than CUDArt, like for tracking device allocations wrt. the GC. However, CUDArt is more stable, has much better API coverage, and is used by / has interop with other GPU packages.
- CUDAnative will support arbitrary functions to be compiled and executed on the GPU, and will not be limited to pre-implemented CUDA libraries. The only limitation will be the subset of the Julia language that is supported (for now: no exceptions, no gc, no Julia runtime calls, etc). However, the abstraction level will be similar to that of CUDA. That means, define a kernel function, upload args,
@cuda (blocks, threads) kernel_function(args...)
, download args. Higher-level, library-like functionality will need to be part of other packages, like GPUArrays.jl.