What does and does-not get computed on the GPU? (CuArrays)

Is there a easy explanation or “rule of thumb” about which computations are run on the GPU?

For example, if there is a matrix ‘W’ and array ‘a’, both on the gpu, I am sure that W*a is computed on the GPU. But what about other statements such as

a = 2.f0 * a
a[1:10] = 0
for i=1:10 a[i] = sin(i * 0.1f0)

and others.

Do some of these require bring the array back to the CPU, computing, and sending back?
This would be slower than just computing on the CPU.

Does the computation ever silentlly get sent to the CPU, compute, and send back?

The question is motivated by Flux (used CuArray) and Knet, however interested to understand more broadly.

1 Like

No, as long as you keep working with the CuArray type and don’t get it “demoted” to an array (I’ve seen some broadcast expressions do that), everything will be computed on the GPU. The only exception are scalar expressions like that for loop, but you should get a warning for that (and if you care about performance, disallow them by calling CuArrays.allowscalar(false)).