Max.(v1,v2) on the GPU

What’s the best way to implement the element wise maximum of for 2 vectors of the same length on the GPU using CUDA.jl or any other package?

julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> v2 = [4, 3, 2]
3-element Vector{Int64}:
 4
 3
 2

julia> max.(v1,v2)
3-element Vector{Int64}:
 4
 3
 3

You can assume float32.

It’s just as you’ve written it, except with device arrays:

julia> using CUDA

julia> v1 = cu(Float32[1, 2, 3])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 1.0
 2.0
 3.0

julia> v2 = cu(Float32[4, 3, 2])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 4.0
 3.0
 2.0

julia> max.(v1, v2)
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 4.0
 3.0
 3.0
1 Like

Ahh… it was a Pluto “bug”:


I assumed the issue was the max operation but I guess it was Pluto trying to do some scalar indexing for IO?
Works fine in the REPL.

Might be an instance of https://github.com/JuliaGPU/CUDA.jl/issues/875

That’s probably Pluto.jl replacing the output stack, so GPUArrays.jl’s show methods (which first copy to the CPU as not to trigger scalar iteration) are not used.

@fonsp FYI