Performing vector operations on a specific range of a CuArray

ozmaden · May 18, 2020, 5:16pm

Hello everyone! I’m taking my first steps with Julia and programming GPUs, also this is my first post here. So please forgive me if the question is too banal or has been answered before I could not find anything useful in my brief research.

I’ve got a CPU function which essentially performs a simple vector operation on an array.

function cpuf!(
    signal1::Array,
    signal2::Array,
    start_sample::Integer,
    num_samples_left::Integer
)
    for i = start_sample:num_samples_left + start_sample - 1
        signal1[i] = signal1[i] * signal2[i]
    end
end

It does it in any possible range of an array determined by the arguments start_sample and num_samples_left.

So the naive solution below would probably induce scalar operations on a CuArray and slow the process down, right?

function gpuf!(
    signal1::CuArray,
    signal2::CuArray,
    start_sample::Integer,
    num_samples_left::Integer
)
    for i = start_sample:num_samples_left + start_sample - 1
        signal1[i] = signal1[i] * signal2[i]
    end
end

If so how do I get to this code with maximum performance?

function gpuf!(
    signal1::CuArray,
    signal2::CuArray,
    start_sample::Integer,
    num_samples_left::Integer
)
    signal1 = signal1 .* signal2 
    #where signal1 and signal2 are in the range [start_sample:num_samples_left + start_sample - 1]
end

Do I need to do the splitting on the CPU then upload it onto the GPU?

maleadt · May 19, 2020, 5:48am

Broadcasting should give you good performance, you can use views to limit the scope. If you need more control, you can write a kernel function that performs a single iteration, and launch that in parallel.

ozmaden · July 7, 2020, 8:11pm

Looking back, my question might be ill-posed. The functionality I was searching for is viewing and SubArrays. So the answer I was looking for goes something like this:

function gpuf!(
    signal1::CuArray,
    signal2::CuArray,
    start_sample::Integer,
    num_samples_left::Integer
)
    @views signal1[start_sample:num_samples_left + start_sample - 1] = 
           signal1[start_sample:num_samples_left + start_sample - 1]
        .* signal2[start_sample:num_samples_left + start_sample - 1] 
end

Topic		Replies	Views
CUDA CPU allocations with range General Usage cuda	5	803	January 13, 2022
Correct implementation of CuArray's slicing operations GPU	3	587	October 31, 2023
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	450	March 14, 2024
CuArrays using @views+mul!+transpose+slicing GPU	4	547	April 30, 2021
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5185	January 4, 2021

Performing vector operations on a specific range of a CuArray

Related topics