Problem with GPU programming

I have the following code that I can run on Julia v.1.1:

using LinearAlgebra

L=Symmetric(rand(Float32,10000,10000))
C=zeros(Float32,size(L,1),size(L,2))
D=zeros(Float32,size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C::Array{Float32,2},D::Array{Float32,2},i)
    m=min((i-1),k)
    N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
    C[i,N]=L[N,N]\L[N,i]
    D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

for i=2:size(C,1) Lapprox!(C,D,i) end

I want to take advantage of GPU programming, but I am starting with this type of programming… So far, I tried the following code:

using LinearAlgebra
using CuArrays
using CUDAnative

L=cu(Symmetric(rand(10000,10000)))
C=CuArray{Float32}(undef, size(L,1),size(L,2))
D=CuArray{Float32}(undef, size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C,D,i)
    m=min((i-1),k)
    N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
    C[i,N]=L[N,N]\L[N,i]
    D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

@cuda for i=2:size(C,1) Lapprox!(C,D,i) end

but it doesn’t work… Can anyone give me some insights for the function Lapprox! and @cuda macro line, please? Thank you!

That’s not now GPU programming works. Either you write a kernel function, not a for loop as you’re using here, and launch that with @cuda, or you write a function that you broadcast over an array, which you can run on the GPU by using CuArrays instead.

Ok, thanks :slight_smile: I tried the function

function Lapprox!(C,D)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    m=min(i,k)
    N=Int32.(partialsortperm(L[i+1,1:i],1:m,rev=true))
    C[i+1,N]=L[N,N]\L[N,i+1]
    D[i+1,i+1]=L[i+1,i+1] - transpose(L[i+1,N])*C[i+1,N]
    return nothing
end

and @cuda threads=12 Lapprox!(C,D) but it doesn’t work.

Those heavyweight functions (transpose, partialsortperm) are probably not GPU compatible. Furthermore, you need to be using GPU memory, i.e. CuArrays, for C and D. You’ll need to get some CUDA experience, using CUDAnative in Julia works at a similar abstraction level. Have a look at this tutorial: https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/

For a more high-level abstraction level, you can use the broadcasting functionality from CuArrays (similar to broadcasting on regular Arrays), but you’ll also need to take care about not calling GPU incompatible functionality.

Yes, I think the problem is with the non-compatible functions. I think I saw something like this in the error message. Thanks for your suggestion :slight_smile: Will try to deal with it. CUDA seems interesting but if I’m not able to do it, I will go back to my regular code :slight_smile: As you said, it requires experience and I’m still new in GPU programming.