Why oneAPI.jl array addition is slow?
julia> using BenchmarkTools, oneAPI
julia> c = rand(100,100);
julia> @btime $c.+1
4.950 μs (3 allocations: 78.21 KiB)
julia> a = oneArray(rand(100,100));
julia> @btime $a.+1
423.117 μs (526 allocations: 63.27 KiB)
Three (possible) reasons:
- Your array is small. Your are likely measuring what it costs to launch the broadcast.
- if you want best performance, use a kernel.
- oneAPI.jl is by far not yet as optimized as CUDA.jl.
However, once you get beyond the overhead the raw compute performance in Julia on Intel GPUs is pretty good.
1 Like
I have Intel GPU so i can’t use CUDA.jl. Please improve the documentation of oneAPI.jl. I see that this oneAPI.jl documentation link is broken. How to use kernel?