Dear all,
I have written a simple Euler stepping algorithm for a particular problem which runs on GPU. The vector y_current
is big (~200_000) and it is updated like y_current .= y_current .+ dt * RHS
. A working example can be found here. It was done in ArrayFire. I have a working example in CLArrays but it is 10x slower.
I can’t save every step in a matrix because I need a lot of steps. Instead, I want to record sum(y_current)
at every step. However, including a save mechanism slows down the whole thing 50 times and it is not useful then.
Hence, I am doing something like
for ii=1:10000
updateEuler!(y_current,dt)
#save data, a is own by the GPU
a[ii] = sum(y_current)
end
Not being written in an array fashion, this is slow indeed.
Does anyone ones a trick to save the sum value by staying on the GPU? May be one has to write a specific kernel…
Thank you for your help and suggestions,
Best regards,