KernelAbstractions for splines

weymouth · September 27, 2023, 12:14pm

I’m trying to code a simple b-spline (and then NURBS) on the GPU using kernels to speed up a ton of simultaneous evaluations.

I’m having trouble because Cox-deBoor is recursive and throws an error when I try to use this as a kernel. Alternatively, I could use de Boor’s method but this requires allocating an array, which is also not allowed in a kernel. In the code below, I avoid the problem by “manually” unrolling the recursion in Cox-de Boor, but I’m sure there is a better way.

There is also the issue that I need the spline data on the GPU, and I think I have correctly handled this using a wrapper type, but perhaps this isn’t optimal either.

#wrapper type
struct BSplineCurve{A<:AbstractArray,V<:AbstractVector,T<:Int} <: Function
    pnts::A
    knots::V
    degree::T
end
using StaticArrays
function BSplineCurve(pnts;degree=3)
    count,T = size(pnts, 2),promote_type(eltype(pnts),Float32)
    knots = [zeros(degree); collect(range(0, count-degree) / (count-degree)); ones(degree)]
    BSplineCurve(pnts,SA{T}[knots...],degree)
end
function (l::BSplineCurve)(u::T) where T
    n = size(l.pnts, 2)
    pt = SA{T}[0,0]
    for k in 1:n
        pt += Bd(l.knots,u,k,l.degree)*l.pnts[:,k]
    end
    pt
end
# terrible unrolled recursion 
function Bd(knots, u, k, d)
    d==1 && return Bd1(knots,u,k)
    d==2 && return Bd2(knots,u,k)
    return Bd3(knots,u,k)
end
function Bd3(knots, u, k)
    d=3
    α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*Bd2(knots,u,k),0f0)
    β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd2(knots,u,k+1),0f0)
    α+β
end
function Bd2(knots, u, k)
    d=2
    α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*Bd1(knots,u,k),0f0)
    β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd1(knots,u,k+1),0f0)
    α+β
end
function Bd1(knots, u, k)
    d=1
    α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k]),0f0)
    β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1]),0f0)
    α+β
end

# Does it work with KernelAbstractions?
using CUDA, KernelAbstractions
@kernel function _test!(a::AbstractArray,l::BSplineCurve)
    # Map index to physical space
    I = @index(Global)
    u = (I-1)/(length(a)-1)
    q = l(u)
    a[I] = q'*q # just want some scalar output
end
test!(a,l)=_test!(get_backend(a),64)(a,l,ndrange=length(a))

a = CUDA.zeros(64)
square = BSplineCurve(SA[5 5 0 -5 -5 -5  0  5 5
                         0 5 5  5  0 -5 -5 -5 0],degree=1)

test!(a,square)
a|>Array # yes!

vchuravy · September 27, 2023, 12:57pm

Your pt = SA{T}[0.0] in l looks wrong; that creates an Array{SA{T}}

weymouth · September 27, 2023, 1:30pm

It’s SA{T}[0, 0] , so I think it makes a 2D static vector of the correct type without allocating an array. I was more looking for ways around the recursion issue, but if you have a better approach, I’m happy to hear it.

vchuravy · September 27, 2023, 11:17pm

Who thought this was a good idea?

  SA{T}[ elements ]

  Create SArray literals using array construction syntax. The element type is
  inferred by promoting elements to a common type or set to T when T is
  provided explicitly.

  Examples:

Very easy to confuse with the normal Julia syntax…

but this requires allocating an array

If used with care MArray works on the GPU.

is recursive

There is some fundamental theorem of computing that states that every recursive algorithm can be implemented as an iterative one, but of course you still need workspace (hence MArray).

If you know that you need to unroll it a fixed amount. You can also use a Val{N} and Julia will happily inline the entirety.

So it would be something like:

function Bd1(knots, u, k, ::Val{d}) where d
    if d == 1
        factor = 1
    else
       factor = Bd(knots, u, k+1, Val(d-1))
    end

    α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*factor,0f0)
    β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*factor,0f0)
    α+β
end

vchuravy · September 27, 2023, 11:20pm

Isn’t the iterative implementation simply:

function Bd(knots, u, k, max_d)
factor = 1.0

for d in 1:max_d
   α = ifelse(knots[k]<u≤knots[k+d],(u-knots[k])/(knots[k+d]-knots[k])*factor,0f0)
   β = ifelse(knots[k+1]≤u<knots[k+d+1],(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*factor,0f0)
   factor = α+β
end
return factor

weymouth · September 28, 2023, 7:36am

Thanks for the tips!

The de Boor algorithm is a stable iterative version of Cox-de Boor, but it requires an updateable array. I’ll check into MArray, I haven’t used those before. What do I need to be careful of, in order to make the GPU happy with it? Do I need to allocate it within the struct, or within the l function definition?

Edit: I just tried using ::Val{d}. This seems to work:

struct BSplineCurve{d,A<:AbstractArray,V<:AbstractVector} <: Function
    pnts::A
    knots::V
end
using StaticArrays
function BSplineCurve(pnts;degree=3)
    count,T = size(pnts, 2),promote_type(eltype(pnts),Float32)
    knots = SA{T}[[zeros(degree); collect(range(0, count-degree) / (count-degree)); ones(degree)]...]
    BSplineCurve{degree,typeof(pnts),typeof(knots)}(pnts,knots)
end
function (l::BSplineCurve{d})(u::T) where {T,d}
    pt = SA{T}[0, 0]
    for k in 1:size(l.pnts, 2)
        pt += Bd(l.knots,u,k,Val(d))*l.pnts[:,k]
    end
    pt
end
Bd(knots, u, k, ::Val{0}) = 1
function Bd(knots, u, k, ::Val{d}) where d
    B = 0
    knots[k]<u≤knots[k+d] && (B+=(u-knots[k])/(knots[k+d]-knots[k])*Bd(knots,u,k,Val(d-1)))
    knots[k+1]≤u<knots[k+d+1] && (B+=(knots[k+d+1]-u)/(knots[k+d+1]-knots[k+1])*Bd(knots,u,k+1,Val(d-1)))
    return B
end

marinlauber · March 22, 2024, 2:58pm

@vchuravy I’m Sorry to dig this up from six months ago; I am working on extending this B-Spline code to enable modification of the control points. I am using this exact code (that’s why I didn’t make a new post).

If the code is run on the CPU, specifying the input of the BSplineCurve constructor as MArrays does the trick (I can change the control points). However, this doesn’t run on the GPU as MArray is not an isbit.

It seems MArray cannot be used on the GPU (I am not sure).

Do you see a simple fix for this, or making it work on the GPU with the possibility of changing BSplineCurve.pnts with new values would require writing all this into CUDA code using Array/CuArray for storage?

Thanks in advance!

Topic		Replies	Views
Solve linear systems inside CUDA kernel function GPU	8	672	February 14, 2024
Kernel Compilation error- KernelError: recursion is currently not supported GPU	8	1315	November 8, 2019
Mathematica BSplineCurve in Julia Visualization splines	17	1897	January 9, 2023
Using functions in GPU Kernel (via KernelAbstractions.jl) (k nearest neighbor kernel) GPU	1	946	January 25, 2021
DifferentialEquations + RecursiveArrayTools Broadcasting / CuArrays Kernel compilation error -- GPU diffeq	6	862	February 4, 2020

KernelAbstractions for splines

Related topics