Choosing a GPU interface

I’ve never done any real GPU programming before but I’d like to start putting together some basic filters (similar to sliding window convolutions and such) that can be optimized on GPU. I work with a cluster that has CUDA support and a Mac Pro that has AMD GPUs, so it would be nice to have something that works for both OpenCL and CUDA. I know there’s GPUArrays.jl but the current state of ArrayFire looks a lot more featureful right now.

Are there advantages to one or the other and are there any opinions/experiences as to which is better?

Last time I checked CLArrays.jl doesn’t work on Julia v1 so your only bet is CuArrays.jl, so CUDA for now. There is a more beginner friendly (gentle) intro


For now I’d follow @xiaodai’s recommendation; CLArrays.jl unfortunately does not yet support Julia 1.x, so I can’t recommend you use it. You can still of course write kernels with OpenCL.jl directly, but that’s not something I’d personally find enjoyable (since you’ll be writing them in OpenCL C).

Going forward, I hope to have AMDGPU support working in the next few months, although getting to the level of functionality and reliability of Julia’s CUDA ecosystem is going to take a while. Regardless, I’ll bookmark this thread so that I can provide an update once something akin to CuArrays.jl is available and working for AMD GPUs.


Is there any reason ArrayFire isn’t as good an option? I thought it provided OpnCL and CUDA support.

I keep ArrayFire.jl up to date, but for some reason few people use it.


At one point this came up with Time Holy in passing on a github issue. It looks like ArrayFire is open source and built on CUDA and OpenCL, so I would have figured it had most features either of those would have.

I’m assuming problems with programming new filters is fundamentally a problem with GPU programming in general due to differences in indexing for sliding windows and such, not unique to a particular GPU API. As the Cxx.jl project approaches maturity would it be conceivable for existing C++ bindings (like this) to solve a lot of GPU indexing behavior for creating custom performant filters?

Edit: BTW. I know this is a complicated (to me at least) area of computing that is absolutely necessary to making Julia relevant today but has few people with the skill set to maintain. I really appreciate all the work everyone is doing in this area and all the answers provided.

Yes, I guess that’s the downside with ArrayFire - if you need a filter / operation that is not implemented already you are pretty much out of luck.