CUDAnative use multiple GPUs

gpu
cudanative
parallel

#1

Hi,

I am using julia on docker from maleadt/juliagpu and I’d like to use multiple GPUs, instead of one.
The docs mention this:

dev = CuDevice(0)
CuContext(dev) do ctx
    # allocate things in this context
    @cuda ...
end

but it does not seem to work. I have this block two times, but no matter which device number I choose, it always uses a single GPU instead. Any help is greatly appreciated.


#2

Not sure, but you might need an @async. That do-block syntax may be blocking until the first device finishes its task.


#3

My code looks like this:

function calculateStuff(gpuId)
  dev = CuDevice(gpuId)
  CuContext(dev) do ctx
    @cuda (threads, blocks) expensiveFunction(...)
    synchronize()
  end
end

@spawn calculateStuff(0)
@spawn calculateStuff(1)

I execute julia with two worker processes so I guess that should do the trick? Nonetheless, only one GPU is used.


#4

I don’t have a system with multiple GPUs, so I haven’t really worked on a decent multi-GPU API.
But what I assume is happening here, is that you aren’t executing this code in separate processes. CUDA is an API with global state, and the CuContext(dev) call sets the global context for all subsequent API calls.

Maybe try the following (again, untested, but I think it should work):

using Distributed

@everywhere using CUDAdrv, CUDAnative

@everywhere function expensiveFunction()
    # ...
end

@everywhere function calculateStuff(gpuId)
    dev = CuDevice(gpuId)
    CuContext(dev) do ctx
        return expensiveFunction()
    end
end

s1 = @spawnat 1 calculateStuff(0)
s2 = @spawnat 2 calculateStuff(1)

fetch(s1)
fetch(s2)

I’m not too familiar with Distributed, so @everyone feel free to correct my use of the library.


#5

I resolved the issue. In fact, the problem was that all workers executed the same code - when you let workers preload files, they will execute everything that’s not within a function definition, for example.
I put everything GPU-related in a module and moved it to a separate file, which is loaded by each worker (-L module.jl). Then, the “main” file executes functions from the module using @spawn and everything works as expected. Thanks for the help!


#6

Hi
That sounds a lot like a problem I am having. Would you mind posting a gist with a small example of your setup? Thanks a million!