Help converting a Pytorch tensor to Julia CuArray

I am trying to use PythonCall/juliacall to convert a PyTorch tensor to a Julia CuArray directly from the pointer (I want to avoid unnecessary copying if possible). Does anyone know what I am doing wrong, as it seems the output isn’t correct since the array should consist of 1s and 0s only (unless I am just misreading it)?

MWE Colab Link

Code

import torch

sz = (100, 100)
arr = np.random.choice([0, 1], size=sz)

# Step 1: Create a PyTorch tensor and transfer it to GPU
tensor = torch.tensor(arr, dtype=torch.float32).cuda()
print(tensor)

# Step 2: Convert to CuArray using PythonCall
cu_arr = jl.unsafe_wrap(jl.CuArray, jl.PythonCall.getptr(tensor), sz)
print(cu_arr)

Output

tensor([[0., 1., 0.,  ..., 1., 0., 0.],
        [0., 1., 1.,  ..., 1., 1., 1.],
        [0., 1., 1.,  ..., 0., 1., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 1., 0.],
        [0., 1., 1.,  ..., 0., 1., 1.],
        [1., 1., 1.,  ..., 0., 0., 0.]], device='cuda:0')
PythonCall.C.PyObject[PythonCall.C.PyObject(2, Ptr{Nothing} @0x00005bbbbba26dd0) PythonCall.C.PyObject(0, Ptr{Nothing} @0x00005bbbbba26dd0) PythonCall.C.PyObject(133973972021008, Ptr{Nothing} @0x000079daf8e4c1b0) PythonCall.C.PyObject(133978876351936, Ptr{Nothing} @0x000079d93f02f5b0) PythonCall.C.PyObject(19, Ptr{Nothing} @0x7c25d4026712e6d2)...

PythonCall.getptr is an internal function and does not get the pointer you want.

You can get the CUDA pointer from the __cuda_array_interface__.

Oh okay, so if I do this to get the underlying pointer, is there a way to convert that to a Julia pointer using juliacall?

import torch

sz = (100, 100)
arr = np.random.choice([0, 1], size=sz)

# Step 1: Create a PyTorch tensor and transfer it to GPU
tensor = torch.tensor(arr, dtype=torch.float32).cuda()

# Step 2: Get pointer of the tensor
ptr = tensor.data_ptr()
print("pointer: ", ptr)
pointer:  138064357793280

CUDA.jl provides a CuPtr{T} but I am having trouble converting this to a pointer that Julia understands

CuPtr{Float32}(pyconvert(UInt, Ptr))

perhaps?

Oh this is almost there, I just need to figure out how to pass in a python variable into a jl.seval("") call. When I hardcode the pointer integer, this works

import torch

sz = (100, 100)
arr = np.random.choice([0, 1], size=sz)

# Step 1: Create a PyTorch tensor and transfer it to GPU
tensor = torch.tensor(arr, dtype=torch.float32).cuda()
print("Pytorch Tensor : ", tensor)

# Step 2: Get pointer of the tensor
ptr = tensor.data_ptr()
print("pointer: ", ptr)

# DOESN"T WORK
# cu_ptr = jl.seval("""
# CuPtr{Float32}(pyconvert(UInt, ptr))
# """)

# Convert to julia CuPtr (IDK how to pass in the variable `ptr`)
cu_ptr = jl.seval("""
CuPtr{Float32}(pyconvert(UInt, 138064357793280))
""")
print("julia pointer: ", cu_ptr)

# Convert to CUDA array
cu_arr = jl.unsafe_wrap(jl.CuArray, cu_ptr, sz)
cu_arr
Pytorch Tensor :  tensor([[0., 0., 0.,  ..., 0., 1., 1.],
        [0., 1., 0.,  ..., 1., 1., 1.],
        [0., 1., 1.,  ..., 0., 1., 0.],
        ...,
        [0., 1., 0.,  ..., 0., 1., 1.],
        [1., 0., 1.,  ..., 0., 1., 1.],
        [1., 1., 0.,  ..., 1., 0., 0.]], device='cuda:0')
pointer:  138064357752832
julia pointer:  CuPtr{Float32}(0x00007d919d009e00)
100×100 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0  1.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0  …  1.0  1.0  0.0  0.0  1.0  1.0  0.0  0.0  1.0
 0.0  1.0  0.0  0.0  1.0  0.0  1.0  1.0  0.0  0.0     1.0  0.0  0.0  0.0  1.0  0.0  1.0  0.0  0.0
 0.0  0.0  1.0  1.0  0.0  1.0  1.0  0.0  1.0  0.0     1.0  0.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0
 0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0  0.0     0.0  0.0  1.0  0.0  1.0  1.0  0.0  1.0  1.0
 1.0  1.0  0.0  1.0  0.0  1.0  0.0  0.0  1.0  0.0     0.0  0.0  0.0  1.0  0.0  1.0  1.0  0.0  0.0
 1.0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  0.0  1.0  …  0.0  0.0  1.0  1.0  1.0  0.0  1.0  0.0  0.0
 1.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  1.0  0.0     1.0  0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0
 1.0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0     1.0  0.0  1.0  0.0  0.0  1.0  0.0  1.0  0.0
 0.0  1.0  0.0  0.0  1.0  0.0  0.0  1.0  0.0  0.0     0.0  1.0  1.0  0.0  1.0  0.0  0.0  0.0  1.0
 0.0  1.0  0.0  0.0  1.0  1.0  0.0  1.0  0.0  0.0     0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  1.0
 ⋮                        ⋮                        ⋱                      ⋮                   
 0.0  0.0  1.0  1.0  1.0  1.0  1.0  0.0  1.0  1.0     0.0  0.0  1.0  0.0  0.0  1.0  0.0  1.0  0.0
 0.0  1.0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0     1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0  1.0
 0.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0     1.0  0.0  0.0  0.0  0.0  1.0  0.0  1.0  0.0
 0.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0     1.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0
 1.0  0.0  1.0  0.0  0.0  1.0  1.0  0.0  0.0  1.0  …  0.0  0.0  1.0  1.0  0.0  1.0  1.0  0.0  1.0
 0.0  1.0  0.0  1.0  0.0  0.0  0.0  1.0  1.0  0.0     0.0  0.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0
 1.0  1.0  1.0  0.0  1.0  0.0  0.0  0.0  1.0  0.0     1.0  0.0  1.0  1.0  0.0  1.0  0.0  1.0  0.0
 0.0  1.0  0.0  1.0  0.0  0.0  1.0  0.0  0.0  0.0     1.0  0.0  1.0  0.0  1.0  0.0  1.0  1.0  1.0
 0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0     0.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  1.0

If you don’t mind the extra dependency, GitHub - pabloferz/DLPack.jl: Julia interface for dlpack makes this trivial.

Is that for CUDA only? I wrote the kernels for a package using KernelAbstractions.jl so I want to build this out in a vendor neutral way. I am just using Pytorch and CUDA for testing the implementation in python but it seems like the pointer approach might be more flexible than DLPack for this?

Specifically this package btw https://github.com/Dale-Black/DistanceTransforms.jl/blob/master/src/transform.jl

It is, but you could extend https://github.com/pabloferz/DLPack.jl/blob/main/src/cuda.jl to work for other GPU array types. See https://github.com/pabloferz/DLPack.jl/blob/main/src/DLPack.jl#L36-L39

You can create an anonymous function to pass ptr in to:

cu_ptr = jl.seval("""
ptr -> CuPtr{Float32}(pyconvert(UInt, ptr))
""")(ptr)

(But the above suggestions to use DLPack are indeed simpler.)

1 Like

Brilliant, thank you so much. I will look into DLPack more too. If it’s easy to extend to various GPU vendors that is incredible