Brusselator example from DiffEqGPU won't run or performed badly after simple fix

Doing my first steps in Julia, I often find that good examples from “official” sources don’t work because things change so quickly. This stiff PDE example returning error
"Scalar indexing is disallowed "


I googled this type of error and wrapped finite differences with exclusion condition:

CUDA.allowscalar() do du[II[i, j, 1]] = α * (u[II[im1, j, 1]] + u[II[ip1, j, 1]] + u[II[i, jp1, 1]] + u[II[i, jm1, 1]] - 4u[II[i, j, 1]]) + B + u[II[i, j, 1]]^2 * u[II[i, j, 2]] - (A + 1) * u[II[i, j, 1]] + brusselator_f(x, y, t) end

Now it works but performance is abysmal. Utilization of my 4090 is ~20-30% and this example is FP32! Computational time ~35 min:

What else need to be changed to perform well?

Yeah if you are removing the allowscalar to get around an error then you are guaranteed bad performance.

@maleadt this seems like a regression in broadcast support in CUDA.jl?

1 Like

Hard to tell, e.g., it’s not clear what array types are being used here. If a CPU array is accidentally being used in that expression, it would fall back to a CPU broadcast doing scalar iteration, and that would be expected. If this did work fine on a previous version of CUDA.jl, please file an issue with a minimal example.

I dont know, i am newbie and just tried this “seems to be outdated” code.
I believe that possible fix is turn everything into CUDA arrays. But after that OrdinaryDiffEq.jl and Rosenbrock23() may not work because this is CPU things.

It’s just a broadcast over CuArrays

The MWE should be:

N = 1000
du = cu(rand(Float32,N,N))
u = cu(rand(Float32,N,N))
A = 1f0
B = 1f0
α = 1f0
t = 1f0
II = LinearIndices((N, N, 2))
kernel_u!.(Ref(du), Ref(u), A, B, α, Ref(II), CartesianIndices((N, N)), t)

though I don’t have a GPU on me right now to run it, that’s what is scalar indexing.

Can you please specify how to define that du and u now became CuArray inside this kernel?

kernel_u! = let N = N, xyd = xyd_brusselator, dx = step(xyd_brusselator)
    @inline function (du, u, A, B, α, II, I, t)
        i, j = Tuple(I)
        x = xyd[I[1]]
        y = xyd[I[2]]
        ip1 = limit(i + 1, N)
        im1 = limit(i - 1, N)
        jp1 = limit(j + 1, N)
        jm1 = limit(j - 1, N)
        du[II[i, j, 1]] = α * (u[II[im1, j, 1]] + u[II[ip1, j, 1]] + u[II[i, jp1, 1]] +
                           u[II[i, jm1, 1]] - 4u[II[i, j, 1]]) +
                          B + u[II[i, j, 1]]^2 * u[II[i, j, 2]] - (A + 1) * u[II[i, j, 1]] +
                          brusselator_f(x, y, t)
    end
end

I did in the head of program:

u = cu(zeros(Float32,N,N))
du = cu(zeros(Float32,N,N))

but this won’t translate inside this kernels.

What types are you getting in there?