Basic NN forward passage: CuArray fine, oneAPI array:

Hello, I am “testing” GPU programming with a mock forward passage of a basic NN:

using Test, oneAPI, BenchmarkTools, LinearAlgebra


relu(x) = max(0,x)
forward_layer(x,w,w0,f) = f.(w*x .+ w0)


function forward_network!(y,x,w1,w2,w3,w01,w02,w03,f=relu)
    x1 = forward_layer(x,w1,w01,f)
    x2 = forward_layer(x1,w2,w02,f)
    y  .= forward_layer(x2,w3,w03,identity)
    return nothing
end


(nd0,nd1,nd2,ndy) = (200,300,300,1)
x   = rand(Float32,nd0);      y = Vector{Float32}(undef,ndy)
w1  = rand(Float32,nd1,nd0); w2 = rand(Float32,nd2,nd1); w3 = rand(Float32,ndy,nd2)
w01 = rand(Float32,nd1);    w02 = rand(Float32,nd2);    w03 = rand(Float32,ndy);

y
forward_network!(y,x,w1,w2,w3,w01,w02,w03,relu)
y

y_g   = oneArray{Float32}(undef,ndy)
x_g   = oneArray(x)
w1_g  = oneArray(w1);  w2_g  = oneArray(w2);  w3_g  = oneArray(w3);
w01_g = oneArray(w01); w02_g = oneArray(w02); w03_g = oneArray(w03); 

forward_network!(y_g,x_g,w1_g,w2_g,w3_g,w01_g,w02_g,w03_g,relu)
y_g

y ≈ Array(y_g)

When I use CUDA.jl/CuArray it seems to work fine, and I have also nice speed ups when I increase the matrix sizes, but with oneAPI.jl/oneArray as above I have the error:

ERROR: AssertionError: value_type(decl) == value_type(entry)
Stacktrace:
  [1] emit_function!(mod::LLVM.Module, config::GPUCompiler.CompilerConfig{…}, f::Type, method::GPUCompiler.Runtime.RuntimeMethodInstance; ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/ESxwR/src/rtlib.jl:82
  [2] build_runtime(job::GPUCompiler.CompilerJob; ctx::LLVM.ThreadSafeContext

on the forward_layer call.

  1. Any idea of the reason of the error and how to workaround it ?
  2. Other suggestions concerning the code? Is there something in the code above very wrong ?

[EDIT]
Nevermind, I did instantiate a old (well… few months old…) project. Making a up solved the first issue.
Still, I would like to have comments on the code, as it’s my first GPU code… thanks!

[EDIT 2]
I still have an issue with the oneAPI version of the code. Running the function the first time I have the “correct” result, but running it further times I get different results and sometime Julia crashes, but never on the first ime (also never with CuArray where I always have the same result as expected). Should I open a bug report on the oneAPI.jl package ?