I have the following which gives the correct result, but I need to keep the result on the device.
diffs = CuArray([-0.1 0.2 0.1;-0.2 -0.2 -0.3])
firstmatch = map(x->CUDA.findfirst(y->y<0,x),CUDA.eachslice(diffs,dims=2))
The output is no longer on the device
3-element Array{Int64,1}:
1
2
2
Casting firstmatch
as a CuArray (moving back to the device) is counterproductive for me here.
I think you can / should do this by writing a kernel similar to CUDA.findfirst.
Thanks I’ll try that approach.
Can you comment on the difference between CuArray and CuDeviceArray and why the latter was specified for xs
on line 101?
The documentation on CuDeviceArray didn’t really help clear this up for me.
I’m not sure but I’m guessing it refers to an object that is known to reside on the device when used inside device code.
Yeah that seems reasonable. I’ve created another post which addresses the first part of this problem: how to call kernels vector-wise on a device matrix that update elements of a column vector. Once that is done I will update this thread.