Questions about CUDA.dot() function

Hi,
I encountered two doubts when using the CUDA.dot() function. My understanding is that the CUDA.dot() function returns a CPU scalar after performing calculations on the GPU. I tested the running time of the CUDA.dot() function and the time to copy GPU data from the GPU to the CPU.

using CUDA
using BenchmarkTools

a = CUDA.rand(Float64,10000)
b = CUDA.rand(Float64,10000)
gpuvalue = CuArray([1.0])

@btime CUDA.@sync value1 = CUDA.dot(a,b)  #68.30us
@btime CUDA.@sync value2 = Array(gpuvalue)  #46.5us
  • The two measured times are very close, can I assume that the difference between the two measured times is the time for CUDA.dot() to do calculations on the GPU? That is, in this case, the main calculation time of the CUDA.dot() function is spent on copying GPU data to CPU data.

The second question concerns the order of execution when using the CUDA.dot() function in more complex expressions.

using CUDA
using BenchmarkTools

a = CUDA.rand(Float64,10000)
b = CUDA.rand(Float64,10000)
c = CUDA.rand(Float64,10000)
d = CUDA.rand(Float64,10000)

value = CUDA.dot(a,b)/CUDA.dot(c,d)
  • In the following expression, is it to calculate CUDA.dot(a,b), then calculate CUDA.dot(c,d), and finally divide the two scalar results to get the final result? Or the computation of CUDA.dot(a,b) and CUDA.dot(c,d) will be performed in parallel? This means that the scalar values ​​for the numerator and denominator are obtained in parallel and the final result is obtained.

Thanks!

Not necessarily, as part of the Array(gpuvalue) execution time may be due to Julia overhead that could happen in parallel to GPU computations. The best way to figure that out is to use NSight Compute.

They will not; within a Julia task, all CUDA operations are executed on the same stream, and are thus executed serially. If you want operations to overlap, you should use separate tasks (e.g., you could wrap both operations in a @async and fetch their results, on the condition that you first synchronize after producing the inputs).

1 Like

Thank you for your answer!