Array addition of oneAPI.jl slower

Why oneAPI.jl array addition is slow?

julia> using BenchmarkTools, oneAPI

julia> c = rand(100,100);
julia> @btime $c.+1
  4.950 μs (3 allocations: 78.21 KiB)

julia> a = oneArray(rand(100,100));
julia> @btime $a.+1
  423.117 μs (526 allocations: 63.27 KiB)

Three (possible) reasons:

  • Your array is small. Your are likely measuring what it costs to launch the broadcast.
  • if you want best performance, use a kernel.
  • oneAPI.jl is by far not yet as optimized as CUDA.jl.

However, once you get beyond the overhead the raw compute performance in Julia on Intel GPUs is pretty good.

1 Like

I have Intel GPU so i can’t use CUDA.jl. Please improve the documentation of oneAPI.jl. I see that this oneAPI.jl documentation link is broken. How to use kernel?

I would suggest starting with writing kernels in KernelAbstractions.jl. I assume that in the long run, you don’t want to support Intel GPUs solely anyway. With KernelAbstractions.jl you can target CUDA, oneAPI, ROCm, and Metal together.

2 Likes

Yes, But KernelAbstractions.jl don’t have documentations to start with. Will KernelAbstractions.jl be slower than oneAPI.jl due to overhead of converting code etc.?

https://juliagpu.github.io/KernelAbstractions.jl/stable/

2 Likes

Thanks, I think it would be better to link it on GitHub - JuliaGPU/KernelAbstractions.jl: Heterogeneous programming in Julia page. I used to look in About section on right. Sorry

As for nearly every Julia package, it is linked through the “docs:stable” blue badge on the README.

3 Likes

But you’re right that I also like to have the link in the about section on top

1 Like

Where did you find it? From the navigation menu, if I go to “backends” and then “oneAPI” I arrive to Intel oneAPI ⋅ JuliaGPU

I found it in about section of GitHub - JuliaGPU/oneAPI.jl: Julia support for the oneAPI programming toolkit.