Using Interpolations.jl on CuVector

I am feeding a CuVector to an interpolator created with Interpolations.jl and wanting to get the output as a CuVector, but I am not sure how to achieve this.

Here is what I did. I create an interpolator using sampling points:

using Interpolations
xsample = 0:0.1:1
ysample = rand(length(xsample))
itp = CubicSplineInterpolation(xsample, ysample)

Then I create a CuVector that contains the x-values where I would like to evaluate the interpolator, and perform the interpolation:

using CUDA
cx = cu(rand(100))
y = itp(cx)

The type of y is Vector, not CuVector. I tried to use CuVector as ysample when creating itp, but the result was the same.

I also tried to preallocate y as a CuVector and use the dot syntax for element-wise assignment:

cy = similar(cx)
cy .= itp.(cx)

but this generates an error:

ERROR: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, Interpolations.Extrapolation{Float32, 1, ScaledInterpolation{Float32, 1, Interpolations.BSplineInterpolation{Float32, 1, OffsetArrays.OffsetVector{Float32, Vector{Float32}}, BSpline{Cubic{Line{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Line{OnGrid}}}, Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}}, BSpline{Cubic{Line{OnGrid}}}, Throw{Nothing}}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

I will appreciate any help!

As far as I know, Interpolations.jl is not compatable with CUDA.jl at present.
see https://github.com/JuliaMath/Interpolations.jl/issues/357

@N5N3, thanks for pointing out the Issue.

I wonder if anyone knows any alternative packages that support GPU interpolations exist. I checked a few packages listed in Home Β· Interpolations.jl, but none of them explicitly indicates GPU support.

@wsshin
With ExBroadcast, you can do BSpline Interpolations on Nv’s GPU via broadcast interface.

Example code:

module CUDAInterp
using ExBroadcast, Interpolations, CUDA, Adapt
a = randn(ComplexF32,401,401)
itp = CubicSplineInterpolation((-100:0.5f0:100,-100:0.5f0:100), a) # only BSpline is tested (Lanczos should also work)
cuitp = adapt(CuArray,itp) # you need to transfer the data into GPU's memory
cuitp.(-100:0.01f0:100,(-100:0.01f0:100)') # 20001Γ—20001 CuArray{ComplexF32, 2}
cuitp.(CUDA.randn(100),CUDA.randn(100)) # 100 CuArray{ComplexF32, 1}
cuitp.(randn(100),randn(100)) # error
end
1 Like

In the meanwhile, there is this section in the Interpolations.jl doc.

Thanks for adding it @N5N3. :slight_smile:

And performance is really great! Personally, I compare such operations to the well optimize FFT functions:

# ╔═║ 7d69f92c-ca66-11ed-0998-2588be6b26c1
using CUDA, Interpolations, Adapt, FFTW

# ╔═║ bc4693a7-5f2c-465f-956c-1364dba860a3
arr = CUDA.rand(ComplexF32, 1024, 1024); 

# ╔═║ 6b3d7f61-89c3-48c8-925c-04a566ed8d50
itp = Interpolations.interpolate(arr, (BSpline(Linear()), BSpline(Linear())));

# ╔═║ 9184e149-9322-45d0-9f5f-64943e09379e
cuitp = adapt(CuArray{eltype(arr)}, itp);

# ╔═║ a74017c5-a171-4cc1-a123-263374cd9a1f
y = CuArray(range(1.5, 1023.5, length=1023));

# ╔═║ beb4db2f-a5c9-462f-9654-3a9f3e2a2dbe
x = y'

0.000900 seconds (83 CPU allocations: 4.969 KiB) (1 GPU allocation: 15.969 MiB, 1.43% memmgmt time)
# ╔═║ e105caeb-226c-4282-82bc-93b8b8151085
 CUDA.@time cuitp.(y, y');


# 0.004722 seconds (84 CPU allocations: 4.594 KiB, 88.76% gc time) (1 GPU allocation: 8.000 MiB, 0.34% memmgmt time)
# ╔═║ 47ae0a8e-bce5-4ad4-ba2b-e6d9518d0600
CUDA.@time fft(arr);
2 Likes

@N5N3 @roflmaostc If you run the first three lines of the code in the section mentioned above, you get a scalar indexing error. I.e., running

using Interpolations, Adapt, CUDA # Or any other GPU package
itp = Interpolations.interpolate([1, 2, 3], BSpline(Linear())); # construct the interpolant object on CPU
cuitp = adapt(CuArray{Float32}, itp); 

results in

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.

The example still works, i.e., you can use cuitp, but is this expected?

For me this works without any issues. It’s only when you want to show cuitp (e.g. by not including the ;) that I get the

ERROR: Scalar indexing is disallowed.