Array addition of oneAPI.jl slower

Why oneAPI.jl array addition is slow?

julia> using BenchmarkTools, oneAPI

julia> c = rand(100,100);
julia> @btime $c.+1
  4.950 μs (3 allocations: 78.21 KiB)

julia> a = oneArray(rand(100,100));
julia> @btime $a.+1
  423.117 μs (526 allocations: 63.27 KiB)

Three (possible) reasons:

  • Your array is small. Your are likely measuring what it costs to launch the broadcast.
  • if you want best performance, use a kernel.
  • oneAPI.jl is by far not yet as optimized as CUDA.jl.

However, once you get beyond the overhead the raw compute performance in Julia on Intel GPUs is pretty good.

1 Like

I have Intel GPU so i can’t use CUDA.jl. Please improve the documentation of oneAPI.jl. I see that this oneAPI.jl documentation link is broken. How to use kernel?