# Problem with GPU programming

I have the following code that I can run on Julia v.1.1:

``````using LinearAlgebra

L=Symmetric(rand(Float32,10000,10000))
C=zeros(Float32,size(L,1),size(L,2))
D=zeros(Float32,size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C::Array{Float32,2},D::Array{Float32,2},i)
m=min((i-1),k)
N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
C[i,N]=L[N,N]\L[N,i]
D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

for i=2:size(C,1) Lapprox!(C,D,i) end
``````

I want to take advantage of GPU programming, but I am starting with this type of programmingâ€¦ So far, I tried the following code:

``````using LinearAlgebra
using CuArrays
using CUDAnative

L=cu(Symmetric(rand(10000,10000)))
C=CuArray{Float32}(undef, size(L,1),size(L,2))
D=CuArray{Float32}(undef, size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C,D,i)
m=min((i-1),k)
N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
C[i,N]=L[N,N]\L[N,i]
D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

@cuda for i=2:size(C,1) Lapprox!(C,D,i) end
``````

but it doesnâ€™t workâ€¦ Can anyone give me some insights for the function `Lapprox!` and `@cuda` macro line, please? Thank you!

Thatâ€™s not now GPU programming works. Either you write a kernel function, not a for loop as youâ€™re using here, and launch that with `@cuda`, or you write a function that you broadcast over an array, which you can run on the GPU by using CuArrays instead.

Ok, thanks I tried the function

``````function Lapprox!(C,D)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
m=min(i,k)
N=Int32.(partialsortperm(L[i+1,1:i],1:m,rev=true))
C[i+1,N]=L[N,N]\L[N,i+1]
D[i+1,i+1]=L[i+1,i+1] - transpose(L[i+1,N])*C[i+1,N]
return nothing
end
``````

and `@cuda threads=12 Lapprox!(C,D)` but it doesnâ€™t work.

Those heavyweight functions (`transpose`, `partialsortperm`) are probably not GPU compatible. Furthermore, you need to be using GPU memory, i.e. `CuArray`s, for `C` and `D`. Youâ€™ll need to get some CUDA experience, using CUDAnative in Julia works at a similar abstraction level. Have a look at this tutorial: https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/

For a more high-level abstraction level, you can use the broadcasting functionality from CuArrays (similar to broadcasting on regular Arrays), but youâ€™ll also need to take care about not calling GPU incompatible functionality.

1 Like

Yes, I think the problem is with the non-compatible functions. I think I saw something like this in the error message. Thanks for your suggestion Will try to deal with it. CUDA seems interesting but if Iâ€™m not able to do it, I will go back to my regular code As you said, it requires experience and Iâ€™m still new in GPU programming.