AOT: Is it possible to enable AOT compilation? It takes around 10 seconds to compile the kernel. Compilation happens every time even when I rarely change the kernel.
Portability: When I run the code on a machine without CUDA, an error occurs at the first inclusion of “using CUDAdrv, CUDAnative” . Currently my strategy is to keep CUDANative code in separate files and only include them when the machine has CUDA. Is there a better elegant strategy to have the code run on a machine with and without CUDA other than selective inclusion of source files.
That shouldn’t be the case. Yes, the first compilation takes a while, but that is mainly due to the CUDAnative.jl and LLVM.jl packages getting compiled to native code. Future developments on the Julia compiler should improve this.
Subsequent compilation of new or modified kernels should take less than 1 second, depending on the complexity.
Yes, conditional modules are a weak point of the package infrastructure. @MikeInnes has also been dealing with this, I think what we do in general is have conditional code depending on whether CUDAnative is installed (by checking Pkg.installed) and a device is available, checked at package build time setting a global flag. That way, you avoid nasty errors by having CUDAnative listed in REQUIRE but throwing a fit during Pkg.build.
RE portability: As far as possible, you should just deal with AbstractArrays. If you do need to special-case the GPU, e.g. with a custom kernel, you can handle that with Requires.jl:
@require CuArrays begin
using CuArrays, CUDAnative
# overload some functions for CuArray
end
The GPU workflow for your library will look something like:
using Foo, CuArrays
my_data = cu(my_data)
Foo.excellent_computations(my_data)
This is a pretty reasonable way to ask for GPU support, and it will just work if you do things as above – as well as being much more robust than things like checking Pkg.installed at compile time.