Problem with GPU programming

sergevic · September 13, 2019, 12:49am

I have the following code that I can run on Julia v.1.1:

using LinearAlgebra

L=Symmetric(rand(Float32,10000,10000))
C=zeros(Float32,size(L,1),size(L,2))
D=zeros(Float32,size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C::Array{Float32,2},D::Array{Float32,2},i)
    m=min((i-1),k)
    N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
    C[i,N]=L[N,N]\L[N,i]
    D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

for i=2:size(C,1) Lapprox!(C,D,i) end

I want to take advantage of GPU programming, but I am starting with this type of programming… So far, I tried the following code:

using LinearAlgebra
using CuArrays
using CUDAnative

L=cu(Symmetric(rand(10000,10000)))
C=CuArray{Float32}(undef, size(L,1),size(L,2))
D=CuArray{Float32}(undef, size(L,1),size(L,2))
D[1,1]=1
k=750

function Lapprox!(C,D,i)
    m=min((i-1),k)
    N=Int32.(partialsortperm(L[i,1:(i-1)],1:m,rev=true))
    C[i,N]=L[N,N]\L[N,i]
    D[i,i]=L[i,i] - transpose(L[i,N])*C[i,N]
end

@cuda for i=2:size(C,1) Lapprox!(C,D,i) end

but it doesn’t work… Can anyone give me some insights for the function Lapprox! and @cuda macro line, please? Thank you!

maleadt · September 13, 2019, 5:44am

That’s not now GPU programming works. Either you write a kernel function, not a for loop as you’re using here, and launch that with @cuda, or you write a function that you broadcast over an array, which you can run on the GPU by using CuArrays instead.

sergevic · September 13, 2019, 3:57pm

Ok, thanks I tried the function

function Lapprox!(C,D)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    m=min(i,k)
    N=Int32.(partialsortperm(L[i+1,1:i],1:m,rev=true))
    C[i+1,N]=L[N,N]\L[N,i+1]
    D[i+1,i+1]=L[i+1,i+1] - transpose(L[i+1,N])*C[i+1,N]
    return nothing
end

and @cuda threads=12 Lapprox!(C,D) but it doesn’t work.

maleadt · September 13, 2019, 4:04pm

Those heavyweight functions (transpose, partialsortperm) are probably not GPU compatible. Furthermore, you need to be using GPU memory, i.e. CuArrays, for C and D. You’ll need to get some CUDA experience, using CUDAnative in Julia works at a similar abstraction level. Have a look at this tutorial: https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/

For a more high-level abstraction level, you can use the broadcasting functionality from CuArrays (similar to broadcasting on regular Arrays), but you’ll also need to take care about not calling GPU incompatible functionality.

sergevic · September 13, 2019, 4:09pm

Yes, I think the problem is with the non-compatible functions. I think I saw something like this in the error message. Thanks for your suggestion Will try to deal with it. CUDA seems interesting but if I’m not able to do it, I will go back to my regular code As you said, it requires experience and I’m still new in GPU programming.

Topic		Replies	Views
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3332	December 20, 2018
Performance of kernel function GPU	3	456	November 28, 2019
CUDAnative is awesome! GPU	12	5977	December 3, 2018
Running For loops on GPU GPU first-steps	11	6523	July 19, 2021
CUDA example in Julia doesn't use the GPU General Usage cuda	2	1229	February 5, 2022

Problem with GPU programming

Related topics