What’s the best way to implement the element wise maximum of for 2 vectors of the same length on the GPU using CUDA.jl or any other package?
julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> v2 = [4, 3, 2]
3-element Vector{Int64}:
4
3
2
julia> max.(v1,v2)
3-element Vector{Int64}:
4
3
3
You can assume float32.
It’s just as you’ve written it, except with device arrays:
julia> using CUDA
julia> v1 = cu(Float32[1, 2, 3])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
1.0
2.0
3.0
julia> v2 = cu(Float32[4, 3, 2])
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
4.0
3.0
2.0
julia> max.(v1, v2)
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
4.0
3.0
3.0
1 Like
Ahh… it was a Pluto “bug”:
I assumed the issue was the max operation but I guess it was Pluto trying to do some scalar indexing for IO?
Works fine in the REPL.
maleadt
September 20, 2021, 8:11am
5
That’s probably Pluto.jl replacing the output stack, so GPUArrays.jl’s show methods (which first copy to the CPU as not to trigger scalar iteration) are not used.