Sum is very slow (and I can't figure out why)

Try CUDA.@time, you are not measuring GPU computation time

1 Like