# KernelAbstractions for splines

I’m trying to code a simple b-spline (and then NURBS) on the GPU using kernels to speed up a ton of simultaneous evaluations.

I’m having trouble because Cox-deBoor is recursive and throws an error when I try to use this as a kernel. Alternatively, I could use de Boor’s method but this requires allocating an array, which is also not allowed in a kernel. In the code below, I avoid the problem by “manually” unrolling the recursion in Cox-de Boor, but I’m sure there is a better way.

There is also the issue that I need the spline data on the GPU, and I think I have correctly handled this using a wrapper type, but perhaps this isn’t optimal either.

``````#wrapper type
struct BSplineCurve{A<:AbstractArray,V<:AbstractVector,T<:Int} <: Function
pnts::A
knots::V
degree::T
end
using StaticArrays
function BSplineCurve(pnts;degree=3)
count,T = size(pnts, 2),promote_type(eltype(pnts),Float32)
knots = [zeros(degree); collect(range(0, count-degree) / (count-degree)); ones(degree)]
BSplineCurve(pnts,SA{T}[knots...],degree)
end
function (l::BSplineCurve)(u::T) where T
n = size(l.pnts, 2)
pt = SA{T}[0,0]
for k in 1:n
pt += Bd(l.knots,u,k,l.degree)*l.pnts[:,k]
end
pt
end
# terrible unrolled recursion
function Bd(knots, u, k, d)
d==1 && return Bd1(knots,u,k)
d==2 && return Bd2(knots,u,k)
return Bd3(knots,u,k)
end
function Bd3(knots, u, k)
d=3
α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*Bd2(knots,u,k),0f0)
β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd2(knots,u,k+1),0f0)
α+β
end
function Bd2(knots, u, k)
d=2
α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*Bd1(knots,u,k),0f0)
β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd1(knots,u,k+1),0f0)
α+β
end
function Bd1(knots, u, k)
d=1
α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k]),0f0)
β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1]),0f0)
α+β
end

# Does it work with KernelAbstractions?
using CUDA, KernelAbstractions
@kernel function _test!(a::AbstractArray,l::BSplineCurve)
# Map index to physical space
I = @index(Global)
u = (I-1)/(length(a)-1)
q = l(u)
a[I] = q'*q # just want some scalar output
end
test!(a,l)=_test!(get_backend(a),64)(a,l,ndrange=length(a))

a = CUDA.zeros(64)
square = BSplineCurve(SA[5 5 0 -5 -5 -5  0  5 5
0 5 5  5  0 -5 -5 -5 0],degree=1)

test!(a,square)
a|>Array # yes!
``````

Your `pt = SA{T}[0.0]` in `l` looks wrong; that creates an `Array{SA{T}}`

It’s `SA{T}[0, 0]` , so I think it makes a 2D static vector of the correct type without allocating an array. I was more looking for ways around the recursion issue, but if you have a better approach, I’m happy to hear it.

Who thought this was a good idea?

``````  SA{T}[ elements ]

Create SArray literals using array construction syntax. The element type is
inferred by promoting elements to a common type or set to T when T is
provided explicitly.

Examples:
``````

Very easy to confuse with the normal Julia syntax…

but this requires allocating an array

If used with care `MArray` works on the GPU.

is recursive

There is some fundamental theorem of computing that states that every recursive algorithm can be implemented as an iterative one, but of course you still need workspace (hence MArray).

If you know that you need to unroll it a fixed amount. You can also use a `Val{N}` and Julia will happily inline the entirety.

So it would be something like:

``````function Bd1(knots, u, k, ::Val{d}) where d
if d == 1
factor = 1
else
factor = Bd(knots, u, k+1, Val(d-1))
end

α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*factor,0f0)
β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*factor,0f0)
α+β
end
``````

Isn’t the iterative implementation simply:

``````function Bd(knots, u, k, max_d)
factor = 1.0

for d in 1:max_d
α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*factor,0f0)
β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*factor,0f0)
factor = α+β
end
return factor
``````

Thanks for the tips!

The de Boor algorithm is a stable iterative version of Cox-de Boor, but it requires an updateable array. I’ll check into `MArray`, I haven’t used those before. What do I need to be careful of, in order to make the GPU happy with it? Do I need to allocate it within the struct, or within the `l` function definition?

Edit: I just tried using `::Val{d}`. This seems to work:

``````struct BSplineCurve{d,A<:AbstractArray,V<:AbstractVector} <: Function
pnts::A
knots::V
end
using StaticArrays
function BSplineCurve(pnts;degree=3)
count,T = size(pnts, 2),promote_type(eltype(pnts),Float32)
knots = SA{T}[[zeros(degree); collect(range(0, count-degree) / (count-degree)); ones(degree)]...]
BSplineCurve{degree,typeof(pnts),typeof(knots)}(pnts,knots)
end
function (l::BSplineCurve{d})(u::T) where {T,d}
pt = SA{T}[0, 0]
for k in 1:size(l.pnts, 2)
pt += Bd(l.knots,u,k,Val(d))*l.pnts[:,k]
end
pt
end
Bd(knots, u, k, ::Val{0}) = 1
function Bd(knots, u, k, ::Val{d}) where d
B = 0
knots[k]<u≤knots[k+d] && (B+=(u-knots[k])/(knots[k+d]-knots[k])*Bd(knots,u,k,Val(d-1)))
knots[k+1]≤u<knots[k+d+1] && (B+=(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd(knots,u,k+1,Val(d-1)))
return B
end
``````
1 Like

@vchuravy I’m Sorry to dig this up from six months ago; I am working on extending this B-Spline code to enable modification of the control points. I am using this exact code (that’s why I didn’t make a new post).

If the code is run on the CPU, specifying the input of the `BSplineCurve` constructor as `MArrays` does the trick (I can change the control points). However, this doesn’t run on the GPU as MArray is not an isbit.

It seems `MArray` cannot be used on the GPU (I am not sure).

Do you see a simple fix for this, or making it work on the GPU with the possibility of changing `BSplineCurve.pnts` with new values would require writing all this into CUDA code using Array/CuArray for storage?