Why Julia is much slower than MATLAB on GPU computing?

Oh and try to use a scratch array to store the intermediate result:

ulia> function main(N)
           x = CuArray(DGP(N))
           V0 = CUDA.ones(Float64, N); idx = ()
           a = 0.5
           max_iter = 100
           iter = 0
           tmp = x .+ a * V0'
           while iter < max_iter
               V1 = V0
               tmp .= x .+ a * V1'
               V0, idx = findmax(tmp, dims=2)
               iter += 1
           end
           return V0, idx, iter
       end

That should get rid of most of the memory management time.

4 Likes