Multiple GPUs with CuStateVec in CUDA.jl


I’m trying to use multiple GPUs for computations involving CuStateVec from the CUDA.jl implementation, specifically the function swapIndexBitsMultiDevice!.
However, this function always throws the error

ArgumentError: cannot take the CPU address of a CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}

The following minimal working example allows to reproduce the behavior on my local installation of CUDA 12.3 (with custatevec version 1.5.0) on A100 GPUs.

using CUDA, cuStateVec

psis = map(collect(devices())) do d
    CuStateVec(ComplexF32, 5)

    convert(Vector{CuStateVec}, psis), collect(devices()), 
    [1=>2], Int32[], Int32[], 

Sadly, I do not have access to a different cluster to test if this is a problem occuring only on my installation?
Is there anything wrong with the MWE, specifically, are the psis initialized correctly?
There are no multi-GPU examples or test functions available to compare the MWE to.

I’m looking forward to any insights!

Thanks in advance and kind regards,

Pinging @kslimes, who wrote these wrappers.

1 Like