Passing CuArray to PyTorch Tensor

Hello everyone,

I am actually using this to pass CuArrays to PyTorch Tensor :

using PyCall
tc = pyimport("torch")
using CuArrays 
x=cu(randn(1024,1024))
@time tc.from_numpy(Array(x)).cuda()
 0.025017 seconds (62 allocations: 20.00 MiB)
 ...

Since Array(x) is actually moving x from GPU to CPU, and .cuda() from CPU to GPU, I was wondering if there is a way to stay on the GPU and convert CuArrays to PyTorch tensor without having to go back and forth between CPU and GPU.

Directly applying tc.from_numpy() on CuArrays is not working and gives a SegFault

Let me know if you have any ideas about how to do this.

Thank you !

One solution is to create an PyTorch tensor, convert it to a CuArray by Base.unsafe_wrap, and then manipulate it using CuArrays.jl.
No copy at all.

x_tc = tc.empty((1024,1024), dtype=tc.float64).cuda()
x = Base.unsafe_wrap(CuArray, CuPtr{Float64}(x_tc.data_ptr()), (1024, 1024))
randn!(x)
# randn!(x) modifies PyTorch tensor x_tc because PyTorch tensor x_tc and CuArray x share the same memory.

I haven’t actually tried this code, so there might be minor mistakes…

1 Like

Or, though I don’t know if that’s really possible, possibly CuArray can be converted directly to PyTorch tensor through torch.utils.dlpack.from_dlpack.

Thank you @yatra9 for your answer !
The following code you proposed, is working with CuArrays v1.0.2 with CUDAdrv v3.0.0 and NNlib v0.6.0

Hi everyone. I’ve attempted to use the pointer trick, with no success. I wish to transfer the following Tensor to a CuArray:

julia> t
PyObject tensor([[-0.0478,  0.1304,  0.0551],
        [ 0.1353, -0.1581,  0.0776],
        [-0.0804,  0.0388, -0.0387],
        [-0.0254, -0.0076, -0.0433],
        [ 0.0183, -0.0035, -0.0508]], device='cuda:0')

I used:

julia> x = Base.unsafe_wrap(CuArray, CuPtr{Float64}(t.data_ptr()), t[:shape])
5×3 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
  6.26441e-10  -6.16828e-14  1.51954e-19
  7.74017e-10  -1.62869e-22  6.22065e-24
  7.03467e-12  -2.82217e-19  8.2082e-29
  2.75222e-14  -1.10739e-18  1.41569e-28
 -9.14529e-16   1.54366e-20  2.38145e-24

And, as you can see, the resulting array is not the same data. I assume something is wrong with the CuPtr pointer?

Thank you,

First thought was if you really had 64 bit floats in the tensor? Seems pretty common to run things with 32 bits on the GPU so could be an easy miss.

But I don’t really have any clue about how the GPU things work, so that is my best guess.

It’s true, I’ve tried with Float32 as well, same result. The original question also used Float64, I think.