I am confused about this. So I can write alternating sum function as below
function altsum(x)
res = zero(typeof(x))
d = 1.0
for x1 in x
res = res + d1*x1
d = d * - 1.0
end
end
when I run that what gets executed in parallel? The inner loop? What sort of thinking do I have to do when thinking about GPUs?
using CuArrays, GPUArrays
x = rand(1_000_000)
gpux = CuArray(x)
@time altsum(x)
@time altsum(gpux)