Can Flux handle multiple GPUs?

I would also take a closer look at specification and software requirements in case of older GPU cards. In case of training on CPUs, my experience suggests that Julia Distributed capabilities could be more useful than multithreading. Once I attempted to train quite demanding deep learning model only on CPUs. Here is some info: link Disclaimer is that this was almost my very first contact with Julia so not everything might be correct but in general it should not be that bad. It depends how you look but I am recalling that the first, major part of the training took about the same time on high end CPUs (2 x Platinum 8358) as on high end GPU (V100). [I wrote it but due to some reasons it is a very simplified comparison, there were some GPU inefficiencies and probably the cost of the CPUs, board and RAM exceeded the cost of high end GPU]. I personally never tried FluxMPI.jl but my attempts with bare MPI.jl were not so great but it was probably the case of the model I used and my basic knowledge about MPI. There was a very interesting presentation at the last JuliaCon “Scaling up Training of Any Flux.jl Model Made Easy | Dhairya Gandhi | JuliaCon 2022” and as for DaggerFlux.jl and Dagger.jl I believe should you pursue this road you will receive excellent support especially at the public forums. As for directed acyclic graphs you may also take a look at the very interesting “Introduction to Graph Computing | JuliaCon 2022 | Yadong Li

I believe that the idea of CPU training is quite valid especially in case of large models, cloud environments and discounted pricing and should be even more realistic with the next CPU generation. However, in general this is a contrary view and probably a little futuristic. I have seen several threads on this topic recently and I have tried to discuss this subject several times by myself however as I wrote this (ML/DL on CPUs) seems to be a niche topic and in most cases, should not be probably a favored option currently.