Performance of kernel function

ennvvy · November 28, 2019, 5:00am

I tried executing the following function on the GPU, for array sizes of 10,000.

function update!(s,c,cedg,θ)
  index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
  stride = blockDim().x * gridDim().x
  @inbounds for l=index:stride:length(s)
                s[l]=cedg[l]*CUDAnative.sin(θ[l])
                c[l]=cedg[l]*CUDAnative.cos(θ[l])
            end
end

However, the performance is similar to the code run on the CPU. Is there something wrong in the way it is written?

ennvvy · November 28, 2019, 5:11am

Sorry, I had posted a different version. I have now updated it with the one I use.

baggepinnen · November 28, 2019, 5:19am

s     = zeros(10000);
c     = zeros(10000);
cedg  = rand((0,1),10000) .* randn(10000);
θ     = randn(10000);
cs    = cu(s);
cc    = cu(c);
ccedg = cu(cedg);
cθ    = cu(θ);

function update!(s,c,cedg,θ)
  @inbounds for l=eachindex(s)
    s[l]=cedg[l]*sin(θ[l])
    c[l]=cedg[l]*cos(θ[l])
  end
end
function update!(s::CuArray,c,cedg,θ)
  s .= cedg.*sin.(θ)
  c .= cedg.*cos.(θ)
end

@btime update!($s,$c,$cedg,$θ);
@btime update!($cs,$cc,$ccedg,$cθ);

julia> @btime update!($s,$c,$cedg,$θ);

  132.067 μs (0 allocations: 0 bytes)

julia> @btime update!($cs,$cc,$ccedg,$cθ);
  10.649 μs (108 allocations: 4.38 KiB)

The vectors have to be long enough for it to be worth it though

ennvvy · November 28, 2019, 5:35am

@baggepinnen: Thank you. This worked, I was trying to follow the tutorial on GPU programming using CuArrays and was trying to fit my functions like the ones in the example. One clarification, do I not have to specify the number of threads and blocks or even mention @cuda for it to be executed on the GPU?

Topic		Replies	Views
Rewriting function on CPU for execution on GPU GPU	4	906	November 29, 2019
Slow speed-up in simple GPU kernel New to Julia gpu	3	660	October 25, 2021
What is the optimal way of updating CuArray? GPU cudanative	7	1507	July 5, 2018
Use GPU subfunction in a bigger-function? New to Julia cudanative , cuarrays	5	755	April 19, 2020
CUDA \| nested loops kernel GPU question	5	170	May 12, 2025

Performance of kernel function

Related topics