oneAPI.jl @sync bug?

Hi,
I am trying the oneAPI.jl package and I have an issue with timing:

The following MWE gives me incredible speed-ups and I guess that oneAPI.@sync may not work properly…

using oneAPI
using BenchmarkTools

n=2^25

oneAPI.allowscalar(false)

x=rand(Float32,n)
y=rand(Float32,n)

ox=oneArray(x)
oy=oneArray(y)

vadd(x,y) = @. y+=x

vadd(x,y)
vadd(ox,oy)

@show sum(y),sum(oy)
 
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed oneAPI.@sync $vadd($ox,$oy)

println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)") 

returns

(sum(y), sum(oy)) = (3.3554044f7, 3.3554044f7)
tcpu=0.01556782, tgpu=6.7038e-5, SpUp=232.22381335958707

The same code with CUDA.jl seems to work fine:

Summary



using CUDA
using BenchmarkTools

n=2^25

CUDA.allowscalar(false)

x=rand(Float32,n)
y=rand(Float32,n)

ox=CuArray(x)
oy=CuArray(y)

vadd(x,y) = @. y+=x

vadd(x,y)
vadd(ox,oy)

@show sum(y),sum(oy)
 
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed CUDA.@sync $vadd($ox,$oy)

println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)")

returns:

(sum(y), sum(oy)) = (3.3550048f7, 3.355005f7)
tcpu=0.015513745, tgpu=0.003838915, SpUp=4.041179604132939

Any hints ?

1 Like

There is no oneAPI.@sync, you’re just calling Base.@sync. Use synchronize() instead, but you can file a feature request issue to add a macro like that too.

2 Likes

Thank you very much !
synchronize() is fine. Can I use this function with CUDA.jl too ?

I also found that the sum function was pretty slow with Gen9. Do you think that there is still room for improvement for reduction implementation with oneAPI ?

Yeah, but it’s a different function, so you’ll have to specify if you import both.

Probably lots; oneAPI.jl hasn’t been profiled or optimized yet.

1 Like