Using cuBLASDx in Julia

I am really interested in using cuBLASDx, that is the CUDA API set to perform BLAS calculations inside CUDA kernels. However, since it is not provided in the CUDA Toolkit, but it should be downloaded separately, it seems that there is not an interface in CUDA.jl.
Is there an alternative way to use it in Julia?
And what about the other libraries in the MathDx package (e.g., cuSolverDx)?

cuBLASDx is a CUDA C++ header only library

Sadly, that makes it inaccessible from CUDA.jl
I would recommend looking at GemmKernels.jl and maybe you can write an API that provides a device side BLAS in pure Julia?

1 Like

What about cuSolverDx, that is part of the MathDx package, but it is not a header only library?

https://docs.nvidia.com/cuda/cusolverdx/get_started/installation.html

It is implemented as a header + a static library.

The closest thing we currently support is libcudadevrt CUDA.jl/src/compiler/compilation.jl at f3b3f8bff81d94b028abd92a0004971febd8dc87 · JuliaGPU/CUDA.jl · GitHub

So yes, it’s technically feasible to access that functionality, but you will need to get your hands dirty and prototype a suitable Julia binding.

What could it make it easier to work with cuBLASDx/cuSolverDx? I assume dealing with C++ is not optimal. Would C-API that generates pieces you can link be better?

I would say that anything will work as long as it can be called inside a kernel written in Julia to perform linear algebra routines and linear system solving.

NVIDIA has started to make available a lot of device-side extensions to perform mathematical operations (see the whole MathDx package, that has also FFT and RNG), even though they are not available in the CUDA Toolkit. Having them also in the Julia environment could be a great plus.

@llukas yeah that would make things certainly easier. Then I at least don’t have to worry about the ABI and can generate wrappers easily.