Performing vector operations on a specific range of a CuArray

Hello everyone! I’m taking my first steps with Julia and programming GPUs, also this is my first post here. So please forgive me if the question is too banal or has been answered before :slight_smile: I could not find anything useful in my brief research.

I’ve got a CPU function which essentially performs a simple vector operation on an array.

function cpuf!(
    signal1::Array,
    signal2::Array,
    start_sample::Integer,
    num_samples_left::Integer
)
    for i = start_sample:num_samples_left + start_sample - 1
        signal1[i] = signal1[i] * signal2[i]
    end
end

It does it in any possible range of an array determined by the arguments start_sample and num_samples_left.

So the naive solution below would probably induce scalar operations on a CuArray and slow the process down, right?

function gpuf!(
    signal1::CuArray,
    signal2::CuArray,
    start_sample::Integer,
    num_samples_left::Integer
)
    for i = start_sample:num_samples_left + start_sample - 1
        signal1[i] = signal1[i] * signal2[i]
    end
end

If so how do I get to this code with maximum performance?

function gpuf!(
    signal1::CuArray,
    signal2::CuArray,
    start_sample::Integer,
    num_samples_left::Integer
)
    signal1 = signal1 .* signal2 
    #where signal1 and signal2 are in the range [start_sample:num_samples_left + start_sample - 1]
end

Do I need to do the splitting on the CPU then upload it onto the GPU?

Broadcasting should give you good performance, you can use views to limit the scope. If you need more control, you can write a kernel function that performs a single iteration, and launch that in parallel.