Multiple GPUs - One GPU per Process - Only one GPU works on four

quantiota · January 6, 2023, 5:00pm

hey, After checking nvidia-smi command, only one GPU works on four when running this code. It seems that the processes are not distributed to the workers, Am I missing something?

using Plots
using Flux:params
using Flux

# spawn one worker per device
using Distributed, CUDA
addprocs(length(devices()))
@everywhere using CUDA

# assign devices
asyncmap((zip(workers(), devices()))) do (p, d)
    remotecall_wait(p) do
        @info "Worker $p uses $d"
        device!(d)
    end
end



# Define the model
m = Chain(
  Dense(10, 5,  σ ),  # first layer with 10 inputs and 5 outputs, using the sigmoid function as the activation
  Dense(5, 1),  # second layer with 5 inputs and 1 output
  identity  # identity function as the activation for the output layer
) |> gpu

# Define a loss function and an optimizer
loss(x, y) = Flux.mse(m(x), y)
optimizer = ADAM()

# Generate some synthetic data
X = Array{Float64}(rand(10, 1000)) |> gpu
Y = Array{Float64}(rand(1, 1000)) |> gpu

data = [(X, Y)]  

# Train the model
@time CUDA.allowscalar for i in 1:600
  Flux.train!(loss, params(m), [(X, Y)], optimizer)
  end


# Make predictions on the training data
predictions = m(X)


# Plot the predictions versus the true values
scatter(Y, predictions, xlabel="True values", ylabel="Predictions", primary = false)

From worker 3: [ Info: Worker 3 uses CuDevice(1)
From worker 4: [ Info: Worker 4 uses CuDevice(2)
From worker 2: [ Info: Worker 2 uses CuDevice(0)
From worker 5: [ Info: Worker 5 uses CuDevice(3)
6.436934 seconds (9.83 M allocations: 684.209 MiB, 3.08% gc time, 1.01% compilation time: 100% of which was recompilation)

jmair · January 8, 2023, 8:07am

I think you are only running your script in the main process and not running anything in the worker processes. Anything in the main script by default will run on the main process (you can check by using myid). You have to use functions like remotecall_wait/pmap etc to run code on the other processes, the main process is usually only used to coordinate the work between the workers.

EDIT: You may have to use FluxMPI.jl to use all of the GPUs.

Topic		Replies	Views
CUDAnative use multiple GPUs GPU gpu , cudanative , parallel	5	1802	March 24, 2018
Pmap with multiple GPUs GPU	8	992	October 5, 2020
How to use multiple GPUs correctly? GPU question	2	2777	October 16, 2019
Scheduling jobs General Usage	2	701	February 22, 2017
How to assign a device for each worker correctly in a multi gpu multi node scenario? Julia at Scale question , distributed	6	1114	March 9, 2023

Multiple GPUs - One GPU per Process - Only one GPU works on four

Related topics