Hello everyone! I’m taking my first steps with Julia and programming GPUs, also this is my first post here. So please forgive me if the question is too banal or has been answered before I could not find anything useful in my brief research.

I’ve got a CPU function which essentially performs a simple vector operation on an array.

```
function cpuf!(
signal1::Array,
signal2::Array,
start_sample::Integer,
num_samples_left::Integer
)
for i = start_sample:num_samples_left + start_sample - 1
signal1[i] = signal1[i] * signal2[i]
end
end
```

It does it in any possible range of an array determined by the arguments `start_sample`

and `num_samples_left`

.

So the naive solution below would probably induce scalar operations on a CuArray and slow the process down, right?

```
function gpuf!(
signal1::CuArray,
signal2::CuArray,
start_sample::Integer,
num_samples_left::Integer
)
for i = start_sample:num_samples_left + start_sample - 1
signal1[i] = signal1[i] * signal2[i]
end
end
```

If so how do I get to this code with maximum performance?

```
function gpuf!(
signal1::CuArray,
signal2::CuArray,
start_sample::Integer,
num_samples_left::Integer
)
signal1 = signal1 .* signal2
#where signal1 and signal2 are in the range [start_sample:num_samples_left + start_sample - 1]
end
```

Do I need to do the splitting on the CPU then upload it onto the GPU?