I have written a simple Euler stepping algorithm for a particular problem which runs on GPU. The vector
y_current is big (~200_000) and it is updated like
y_current .= y_current .+ dt * RHS. A working example can be found here. It was done in ArrayFire. I have a working example in CLArrays but it is 10x slower.
I can’t save every step in a matrix because I need a lot of steps. Instead, I want to record
sum(y_current) at every step. However, including a save mechanism slows down the whole thing 50 times and it is not useful then.
Hence, I am doing something like
for ii=1:10000 updateEuler!(y_current,dt) #save data, a is own by the GPU a[ii] = sum(y_current) end
Not being written in an array fashion, this is slow indeed.
Does anyone ones a trick to save the sum value by staying on the GPU? May be one has to write a specific kernel…
Thank you for your help and suggestions,