Sending data back and forth between cpu and gpu is costly. Why not perform the sum on the gpu?
Also, if you’re going to run stuff on the gpu, I would guess you need to have the entire algorithm on the gpu to avoid excessive data transfer.
Sending data back and forth between cpu and gpu is costly. Why not perform the sum on the gpu?
Also, if you’re going to run stuff on the gpu, I would guess you need to have the entire algorithm on the gpu to avoid excessive data transfer.