Using Interpolations.jl on CuVector

I am feeding a CuVector to an interpolator created with Interpolations.jl and wanting to get the output as a CuVector, but I am not sure how to achieve this.

Here is what I did. I create an interpolator using sampling points:

using Interpolations
xsample = 0:0.1:1
ysample = rand(length(xsample))
itp = CubicSplineInterpolation(xsample, ysample)

Then I create a CuVector that contains the x-values where I would like to evaluate the interpolator, and perform the interpolation:

using CUDA
cx = cu(rand(100))
y = itp(cx)

The type of y is Vector, not CuVector. I tried to use CuVector as ysample when creating itp, but the result was the same.

I also tried to preallocate y as a CuVector and use the dot syntax for element-wise assignment:

cy = similar(cx)
cy .= itp.(cx)

but this generates an error:

ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, Interpolations.Extrapolation{Float32, 1, ScaledInterpolation{Float32, 1, Interpolations.BSplineInterpolation{Float32, 1, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, BSpline{Cubic{Line{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Line{OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}, BSpline{Cubic{Line{OnGrid}}}, Throw{Nothing}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

I will appreciate any help!

As far as I know, Interpolations.jl is not compatable with CUDA.jl at present.
see https://github.com/JuliaMath/Interpolations.jl/issues/357

@N5N3, thanks for pointing out the Issue.

I wonder if anyone knows any alternative packages that support GPU interpolations exist. I checked a few packages listed in Home ยท Interpolations.jl, but none of them explicitly indicates GPU support.

@wsshin
With ExBroadcast, you can do BSpline Interpolations on Nvโ€™s GPU via broadcast interface.

Example code:

module CUDAInterp
using ExBroadcast, Interpolations, CUDA, Adapt
a = randn(ComplexF32,401,401)
itp = CubicSplineInterpolation((-100:0.5f0:100,-100:0.5f0:100), a) # only BSpline is tested (Lanczos should also work)
cuitp = adapt(CuArray,itp) # you need to transfer the data into GPU's memory
cuitp.(-100:0.01f0:100,(-100:0.01f0:100)') # 20001ร—20001 CuArray{ComplexF32, 2}
cuitp.(CUDA.randn(100),CUDA.randn(100)) # 100 CuArray{ComplexF32, 1}
cuitp.(randn(100),randn(100)) # error
end
1 Like

In the meanwhile, there is this section in the Interpolations.jl doc.

Thanks for adding it @N5N3. :slight_smile:

And performance is really great! Personally, I compare such operations to the well optimize FFT functions:

# โ•”โ•โ•ก 7d69f92c-ca66-11ed-0998-2588be6b26c1
using CUDA, Interpolations, Adapt, FFTW

# โ•”โ•โ•ก bc4693a7-5f2c-465f-956c-1364dba860a3
arr = CUDA.rand(ComplexF32, 1024, 1024); 

# โ•”โ•โ•ก 6b3d7f61-89c3-48c8-925c-04a566ed8d50
itp = Interpolations.interpolate(arr, (BSpline(Linear()), BSpline(Linear())));

# โ•”โ•โ•ก 9184e149-9322-45d0-9f5f-64943e09379e
cuitp = adapt(CuArray{eltype(arr)}, itp);

# โ•”โ•โ•ก a74017c5-a171-4cc1-a123-263374cd9a1f
y = CuArray(range(1.5, 1023.5, length=1023));

# โ•”โ•โ•ก beb4db2f-a5c9-462f-9654-3a9f3e2a2dbe
x = y'

0.000900 seconds (83 CPU allocations: 4.969 KiB) (1 GPU allocation: 15.969 MiB, 1.43% memmgmt time)
# โ•”โ•โ•ก e105caeb-226c-4282-82bc-93b8b8151085
 CUDA.@time cuitp.(y, y');


# 0.004722 seconds (84 CPU allocations: 4.594 KiB, 88.76% gc time) (1 GPU allocation: 8.000 MiB, 0.34% memmgmt time)
# โ•”โ•โ•ก 47ae0a8e-bce5-4ad4-ba2b-e6d9518d0600
CUDA.@time fft(arr);
2 Likes