I am reading the most up-to-date documentation I can find, which AFAICS is the most up-to-date available. At the very least, the principles of a completely backend-agnostic GPGPU Julia program aren’t as obvious from the documentation as they could be. And I still feel like something is still missing, even after what I’ve learned from this thread.
(On the upside, if it’s just a documentation matter, then I’ll be happy to give a hand where I can, and if there is indeed still some aspect missing maybe I can contribute to that too!)
To show what I mean, let me try with an example. I need to anticipate that what follows isn’t to diss on the authors of any of the packages —I can easily guess the titanic efforts that go into their development. But I want to clarify what someone coming from my background faces when trying to write a vendor-neutral GPU program in Julia.
Let’s say that I want to write a simple Julia program that runs a simple kernel on GPU. The program should be such that
- a device is selected automatically if the user doesn’t specify one (possibly a performant one, but this is just a bonus);
- the user can override the automatic selection, and should be able to choose any computational device available on the machine and supported by the runtime;
- bonus points, the software informs the user about the device that is going to be used.
Aside from the device selection, the program does something very simple:
- allocates an array (on GPU);
- initializes the array (on GPU);
- verifies (on host) that the array is correctly initialized, informing the user in case of a mismatch (showing the first index, expected and computed values).
- bonus points, it shows the kernel runtime in case of successful execution.
I have some sample code to do this in SYCL here (the sample-select
program allows selection of the platform and device by assigning the corresponding ordinals to environment variables SYCL_PLATFORM
and SYCL_DEVICE
; the details of how the user specifies the device override are —hopefully— not particularly relevant).
One of the key points I want to highlight about the above-linked SYCL examples is that at no point in my SYCL code there’s any mention at all of any specific backend. Which backends are available and which devices they expose to the user is something which is entirely managed by the SYCL runtime (and e.g. if I compile the code against different runtimes —CodePlay vs Intel’s vs AdaptiveCpp— I will end up seeing different devices —but that’s a runtime limitation, not a “my code” limitation.)
So let’s say that I want to port this to Julia.
I go to juliagpu.org and in the Learn section I find that the solution to vendor-neutral GPU programming is KernelAbstractions.jl —so far so good. The 3-hour workshop video has a section on its use, and from it I learn that at least for memory allocations, “vendor neutrality” is achieved with an if
cascade that remaps constants. Not exactly what I’m looking for, but maybe the situation has changed since 2021, when the seminar was recorded.
So I go read the KA QuickStart documentation (I assume this is the latest version) and once again I’m shown different ways to do things for the CPU
, CUDA
, AMDGPU
and oneAPI
backends, each using its own vendor-specific device type to allocate memory.
(FWIW, the same goes for AcceleratedKernels.jl: the README shows an example that explicitly depends on one specific backend —Metal).
The first mention I see of a vendor-neutral way to handle memory allocation is the memcopy
kernel example for KA that shows the existence of KernelAbstractions.zeroes
and KernelAbstractions.ones
. To learn about the non-initializing KernelAbstractions.allocate
I have to browse the API section.
These allocations depend on a backend selection. So it looks like I can write the kernel, the kernel-calling function and even the allocation management in a vendor-neutral way, once the backend has been selected. Excellent, this is pretty close to what I can do in SYCL, where the device-specificity can be relegated (usually) to the queue construction.
Now the backend selection is the only thing missing. In SYCL writing a custom selector, while non-trivial, is rather straightforward and still vendor-neutral: you just iterate over the platforms (i.e. backends) announced by the SYCL runtime.
So no I go to the utils.lj
file in the example, which is documented as the one choosing the backend. It’s an if
cascade (with only two entries: CUDA is attempted, fallback is to CPU).
So unless I’m still missing something, at the very least to be able to implement in JuliaGPU something with the requirements I mentioned above what’s still needed is a higher-level module (or something in KA) that tries to load each of the supported backends (I say try-load because rather than having all of them as hard dependencies, it’d be more sensible for the user to only have packages relevant to their hardware installed —e.g. no Metal.jl except on macOS, no AMDGPU if there’s no AMD GPUs, etc), and for each of the available backends finds which ones actually expose viable devices, and selects one of these by some internal criteria in addition to offering to the user a way to choose a different one (backend and device) if they so desire.
There’s more after this (for example, I’d appreciate a consistent interface to access device properties regardless of backend), but at the very least, to answer @giordano’s question too, this is what I find it’s missing. (And truly, if there’s any way I can help, I’d be happy to give a try.)