In part because of having an AMD GPU, the only package I can get to work with it is ArrayFire.jl.
ArrayFire v3.7.0 (OpenCL, 64-bit Linux, build a4485443)  AMD: Ellesmere, 7999 MB -1- INTEL: AMD Ryzen 7 2700X Eight-Core Processor , 32178 MB
Generally, the benchmarks for standard matrix operations look considerably better. The next step for me was to try to utilize my GPU with Flux.jl. I did this by a relatively naive approach, with my understanding of how CUDA interfaces with Flux. From my understanding, this meant converting the arrays of the network into ArrayFire arrays (which requires the arrays be converted to regular, untracked arrays I think), which is exactly what I did:
model = mapleaves(AFArray, mapleaves(Tracker.data, Chain( Dense(24, 24, σ), Dense(24, 24), softmax )))
It’s a simple, small network that I’m just using for benchmarking, but I saw similar results across a number of different sizes and numbers of layers. In order for this to work, I had to extend the AFArray function for a few types, which I tried to do in as simple a way as possible.
ArrayFire.AFArray(func::typeof(σ)) = σ ArrayFire.AFArray(func::typeof(identity)) = identity ArrayFire.AFArray(func::typeof(softmax)) = softmax
And this worked, as in, it evaluated without error. Unfortunately, the GPU version performed much more slowly (0.048556 vs. 0.000040 seconds) for a single evaluation. Is there something that I’m doing wrong here or could be sped up?