Can Flux handle multiple GPUs?

Its hard to refer without exact example however re 8 BLAS threads pls see: BLAS performance testing for Julia 1.8. Not sure if this could be to iterest, however, one may also easily change BLAS backend thanks to libblastrampoline and instead of default OpenBLAS use other libraries like i.e. MKL. Depending on the matrix size and the hardware sometimes this can provide significant benefits. BLAS performance can be checked with i.e. BLASBenchmarksCPU.jl or BLASBenchmarksGPU.jl. Hope it helps.