Hi,
I am trying the oneAPI.jl package and I have an issue with timing:
The following MWE gives me incredible speed-ups and I guess that oneAPI.@sync may not work properly…
using oneAPI
using BenchmarkTools
n=2^25
oneAPI.allowscalar(false)
x=rand(Float32,n)
y=rand(Float32,n)
ox=oneArray(x)
oy=oneArray(y)
vadd(x,y) = @. y+=x
vadd(x,y)
vadd(ox,oy)
@show sum(y),sum(oy)
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed oneAPI.@sync $vadd($ox,$oy)
println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)")
returns
(sum(y), sum(oy)) = (3.3554044f7, 3.3554044f7)
tcpu=0.01556782, tgpu=6.7038e-5, SpUp=232.22381335958707
The same code with CUDA.jl seems to work fine:
Summary
using CUDA
using BenchmarkTools
n=2^25
CUDA.allowscalar(false)
x=rand(Float32,n)
y=rand(Float32,n)
ox=CuArray(x)
oy=CuArray(y)
vadd(x,y) = @. y+=x
vadd(x,y)
vadd(ox,oy)
@show sum(y),sum(oy)
tcpu = @belapsed $vadd($x,$y)
tgpu = @belapsed CUDA.@sync $vadd($ox,$oy)
println("tcpu=$tcpu, tgpu=$tgpu, SpUp=$(tcpu/tgpu)")
returns:
(sum(y), sum(oy)) = (3.3550048f7, 3.355005f7)
tcpu=0.015513745, tgpu=0.003838915, SpUp=4.041179604132939
Any hints ?