CuArray find first negative value along columns

I think you can / should do this by writing a kernel similar to CUDA.findfirst.