Broadcast update multiple vectors at once

Simple question. I have a few vectors which hold the states of a list of particles. I update these with broadcast (which is nice since the vectors are on the GPU). For example:

    position⁰ .= position
    position .+= integrate.(position⁰, ...)

I need to check the states and potentially reset them, but I don’t know how to reset them all at the same time. I can do this

position .= enforce_bounds(position,...)

But now I need to set position⁰ = position only for the positions that were reset. Is there something like:

(position⁰,position) .= enforce_bounds(position,...)

that will let me conditionally reset them both at once?

In this example, I could do the check in the other order, but in my code there are a bunch of other states and conditions so I’m looking for a general way to do this.

I would just write a loop.

A kernel loop? (I can’t scalar index on GPUs.)

Oh, sorry. I missed the GPU part. Maybe have both positions, initial and updated in the same struct can make that GPU friendly?

True, I could make one struct with all the states and then use a vector of those. I think I would loose the easy broadcast update of each state like I have in the example, but I could refactor.

Actually I don’t know, but perhaps map and zip combined can do what you want on the GPU.

This is a test with map!, not sure if helps. It was a good exercise at least:

julia> using CUDA, StaticArrays

julia> function f!(x,y) 
           map!( (xel, yel) -> norm(xel) > norm(yel) ? xel : zero(yel),  y, x, y )
       end
f! (generic function with 1 method)

julia> xcpu = rand(SVector{3,Float64},10^4);

julia> ycpu = rand(SVector{3,Float64},10^4);

julia> xgpu = CuArray(xcpu);

julia> ygpu = CuArray(ycpu);

julia> @btime f!($xcpu, $ycpu);
  165.040 μs (0 allocations: 0 bytes)

julia> @btime f!($xgpu, $ygpu);
  8.409 μs (51 allocations: 2.38 KiB)

maybe map! with something similar can be adapted to your case.

Thanks for the suggestion. But map! only seems to take one destination, so I’m not sure how that’s different than just using a broadcast update.

At the moment, I’ve reset one of the states to zero and then used that as a flag to update the others sequentially.

Another option is to use KernelAbstractions.jl which will multi-thread the loop you wanted to write originally on the GPU or CPU.

1 Like