Deployment questions: AOT and Portability

AOT: Is it possible to enable AOT compilation? It takes around 10 seconds to compile the kernel. Compilation happens every time even when I rarely change the kernel.

Portability: When I run the code on a machine without CUDA, an error occurs at the first inclusion of “using CUDAdrv, CUDAnative” . Currently my strategy is to keep CUDANative code in separate files and only include them when the machine has CUDA. Is there a better elegant strategy to have the code run on a machine with and without CUDA other than selective inclusion of source files.

That shouldn’t be the case. Yes, the first compilation takes a while, but that is mainly due to the CUDAnative.jl and LLVM.jl packages getting compiled to native code. Future developments on the Julia compiler should improve this.

Subsequent compilation of new or modified kernels should take less than 1 second, depending on the complexity.

Yes, conditional modules are a weak point of the package infrastructure. @MikeInnes has also been dealing with this, I think what we do in general is have conditional code depending on whether CUDAnative is installed (by checking Pkg.installed) and a device is available, checked at package build time setting a global flag. That way, you avoid nasty errors by having CUDAnative listed in REQUIRE but throwing a fit during Pkg.build.

I’m not sure whether this applies here, but checkout Revise.jl [ANN] Higher productivity (fewer Julia restarts) with Revise.jl.

RE portability: As far as possible, you should just deal with AbstractArrays. If you do need to special-case the GPU, e.g. with a custom kernel, you can handle that with Requires.jl:

@require CuArrays begin
  using CuArrays, CUDAnative
  # overload some functions for CuArray
end

The GPU workflow for your library will look something like:

using Foo, CuArrays

my_data = cu(my_data)
Foo.excellent_computations(my_data)

This is a pretty reasonable way to ask for GPU support, and it will just work if you do things as above – as well as being much more robust than things like checking Pkg.installed at compile time.

@drjoke seems to be using CUDAdrv/CUDAnative explicitly, not an array-based interface. But yeah, if that’s possible it is a better approach.

You can also write your code against GPUArrays to circumvent this issue, since it’s hardware Independent, but still allows you to write GPU kernels.

See: Writing extendable and hardware agnostic GPU libraries | by Simon Danisch | techburst