 Problem with GPU programming

I have the following code that I can run on Julia v.1.1:

using LinearAlgebra

L=Symmetric(rand(Float32,10000,10000))
C=zeros(Float32,size(L,1),size(L,2))
D=zeros(Float32,size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C::Array{Float32,2},D::Array{Float32,2},i)
m=min((i-1),k)
N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
C[i,N]=L[N,N]\L[N,i]
D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

for i=2:size(C,1) Lapprox!(C,D,i) end

I want to take advantage of GPU programming, but I am starting with this type of programming… So far, I tried the following code:

using LinearAlgebra
using CuArrays
using CUDAnative

L=cu(Symmetric(rand(10000,10000)))
C=CuArray{Float32}(undef, size(L,1),size(L,2))
D=CuArray{Float32}(undef, size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C,D,i)
m=min((i-1),k)
N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
C[i,N]=L[N,N]\L[N,i]
D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

@cuda for i=2:size(C,1) Lapprox!(C,D,i) end

but it doesn’t work… Can anyone give me some insights for the function Lapprox! and @cuda macro line, please? Thank you!

That’s not now GPU programming works. Either you write a kernel function, not a for loop as you’re using here, and launch that with @cuda, or you write a function that you broadcast over an array, which you can run on the GPU by using CuArrays instead.

Ok, thanks I tried the function

function Lapprox!(C,D)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
m=min(i,k)
N=Int32.(partialsortperm(L[i+1,1:i],1:m,rev=true))
C[i+1,N]=L[N,N]\L[N,i+1]
D[i+1,i+1]=L[i+1,i+1] - transpose(L[i+1,N])*C[i+1,N]
return nothing
end

and @cuda threads=12 Lapprox!(C,D) but it doesn’t work.

Those heavyweight functions (transpose, partialsortperm) are probably not GPU compatible. Furthermore, you need to be using GPU memory, i.e. CuArrays, for C and D. You’ll need to get some CUDA experience, using CUDAnative in Julia works at a similar abstraction level. Have a look at this tutorial: https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/

For a more high-level abstraction level, you can use the broadcasting functionality from CuArrays (similar to broadcasting on regular Arrays), but you’ll also need to take care about not calling GPU incompatible functionality.

1 Like

Yes, I think the problem is with the non-compatible functions. I think I saw something like this in the error message. Thanks for your suggestion Will try to deal with it. CUDA seems interesting but if I’m not able to do it, I will go back to my regular code As you said, it requires experience and I’m still new in GPU programming.