Array addition of oneAPI.jl slower

Why oneAPI.jl array addition is slow?

julia> using BenchmarkTools, oneAPI

julia> c = rand(100,100);
julia> @btime $c.+1
  4.950 μs (3 allocations: 78.21 KiB)

julia> a = oneArray(rand(100,100));
julia> @btime $a.+1
  423.117 μs (526 allocations: 63.27 KiB)

Three (possible) reasons:

  • Your array is small. Your are likely measuring what it costs to launch the broadcast.
  • if you want best performance, use a kernel.
  • oneAPI.jl is by far not yet as optimized as CUDA.jl.

However, once you get beyond the overhead the raw compute performance in Julia on Intel GPUs is pretty good.

I have Intel GPU so i can’t use CUDA.jl. Please improve the documentation of oneAPI.jl. I see that this oneAPI.jl documentation link is broken. How to use kernel?

I would suggest starting with writing kernels in KernelAbstractions.jl. I assume that in the long run, you don’t want to support Intel GPUs solely anyway. With KernelAbstractions.jl you can target CUDA, oneAPI, ROCm, and Metal together.

Yes, But KernelAbstractions.jl don’t have documentations to start with. Will KernelAbstractions.jl be slower than oneAPI.jl due to overhead of converting code etc.?

https://juliagpu.github.io/KernelAbstractions.jl/stable/

Thanks, I think it would be better to link it on GitHub - JuliaGPU/KernelAbstractions.jl: Heterogeneous programming in Julia page. I used to look in About section on right. Sorry

As for nearly every Julia package, it is linked through the “docs:stable” blue badge on the README.

But you’re right that I also like to have the link in the about section on top

Where did you find it? From the navigation menu, if I go to “backends” and then “oneAPI” I arrive to Intel oneAPI ⋅ JuliaGPU

I found it in about section of GitHub - JuliaGPU/oneAPI.jl: Julia support for the oneAPI programming toolkit.