Flux allows for parallel training with multiple GPUs: GPU Support · Flux
In my use case, I need to run inference of a single model on multiple GPUs. Is there a way to load balance inference calls to the model so all GPUs are maximally utilized?
Flux allows for parallel training with multiple GPUs: GPU Support · Flux
In my use case, I need to run inference of a single model on multiple GPUs. Is there a way to load balance inference calls to the model so all GPUs are maximally utilized?
I could be wrong, but I expect you would have to roll your own here. Out of curiosity, what sort of model and what sort of inference is it?
I’m just running a small GNN on a graph classification task