Hi,

I am trying the oneAPI.jl package and I have an issue with timing:

The following MWE gives me incredible speed-ups and I guess that oneAPI.@sync may not work properly…

```
using oneAPI
using BenchmarkTools
n=2^25
oneAPI.allowscalar(false)
x=rand(Float32,n)
y=rand(Float32,n)
ox=oneArray(x)
oy=oneArray(y)
vadd(x,y) = @. y+=x
vadd(x,y)
vadd(ox,oy)
@show sum(y),sum(oy)
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed oneAPI.@sync $vadd($ox,$oy)
println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)")
```

returns

```
(sum(y), sum(oy)) = (3.3554044f7, 3.3554044f7)
tcpu=0.01556782, tgpu=6.7038e-5, SpUp=232.22381335958707
```

The same code with CUDA.jl seems to work fine:

## Summary

```
using CUDA
using BenchmarkTools
n=2^25
CUDA.allowscalar(false)
x=rand(Float32,n)
y=rand(Float32,n)
ox=CuArray(x)
oy=CuArray(y)
vadd(x,y) = @. y+=x
vadd(x,y)
vadd(ox,oy)
@show sum(y),sum(oy)
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed CUDA.@sync $vadd($ox,$oy)
println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)")
```

returns:

```
(sum(y), sum(oy)) = (3.3550048f7, 3.355005f7)
tcpu=0.015513745, tgpu=0.003838915, SpUp=4.041179604132939
```

Any hints ?