Does someone know how to correctly set the GPUs in each node correctly to individual workers in those nodes? For instance, imagine I have 2 nodes, with each having 2 GPUs. If I try to use the same procedure as detailed in the CUDA.jl documentation I imagine that the devices() function will return only the devices for the master node, while workers() will return all the workers from all nodes.
I think --gpu-bind=single:1 is a better option, as you can have multiple processes on the same node if it has multiple GPUs, but 1GPU per process and you don’t need to do anything in Julia but use the GPU like normal.
myid() is global yes, but I assume (don’t know for sure) that processes are the same node are adjacent so that you can use device_id=myid() % num_gpus + 1 and use device!(devices()[device_id]) for example. I haven’t tested this method and wouldn’t recommend it but it could be a starting point to try something out. Using the SLURM option is much more ergonomic and I would recommend it.
Oh thank you for the clarification. The reason I am persuing an option that doesn’t rely on slurm is because I want to be able to write a code that uses both the GPUs and CPUs for compute. That way I would assign a gpu to each of the first workers in the node, and a few more workers that would some cpu stuff using Threads.
Note that this should work in both cases: if you use device binding, then length(CUDA.devices()) == 1, and so you will always select device 0. If you don’t, then it will assign them in a round-robin manner (and oversubscribe them fairly evenly if you have more procs than devices per node). It only relies on the ids being sequential per node.