Max.(v1,v2) on the GPU

What’s the best way to implement the element wise maximum of for 2 vectors of the same length on the GPU using CUDA.jl or any other package?

julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> v2 = [4, 3, 2]
3-element Vector{Int64}:
 4
 3
 2

julia> max.(v1,v2)
3-element Vector{Int64}:
 4
 3
 3

You can assume float32.

It’s just as you’ve written it, except with device arrays:

julia> using CUDA

julia> v1 = cu(Float32[1, 2, 3])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 1.0
 2.0
 3.0

julia> v2 = cu(Float32[4, 3, 2])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 4.0
 3.0
 2.0

julia> max.(v1, v2)
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 4.0
 3.0
 3.0
1 Like

Ahh… it was a Pluto “bug”:


I assumed the issue was the max operation but I guess it was Pluto trying to do some scalar indexing for IO?
Works fine in the REPL.

Might be an instance of CuArrays don't seem to display correctly in VS code · Issue #875 · JuliaGPU/CUDA.jl · GitHub

That’s probably Pluto.jl replacing the output stack, so GPUArrays.jl’s show methods (which first copy to the CPU as not to trigger scalar iteration) are not used.

@fonsp FYI