DifferentialEquations + RecursiveArrayTools Broadcasting / CuArrays Kernel compilation error --

cmarcotte · February 3, 2020, 7:26pm

Hi,

Finally have access to a CUDA capable GPU, so I’m trying to accelerate some finite-difference PDE solves using CuArrays and DifferentialEquations.

I began with Chris’ simple reaction-diffusion demonstration here, and while I can get the CPU computation to complete for nearly any solver by removing the RecursiveArrayTools, I can not get any GPU computation to work for any solver. (Yes, the whole CUDA stack is installed and passed all tests.)

Setting CuArrays.allowscalar(false) and running the solver yields a forbidden indexing error, while permitting scalar operations generates a pretty inscrutable GPU compilation error message including “KernelError: passing and using non-bitstype argument”, (similar: here and here).

GPU compilation of #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float64,2,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(+),Tuple{Base.Broadcast.Extruded{Array{Float64,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}}) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(+),Tuple{Base.Broadcast.Extruded{Array{Float64,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.  .args is of type Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(+),Tuple{Base.Broadcast.Extruded{Array{Float64,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}} which is not isbits.
    .1 is of type Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(+),Tuple{Base.Broadcast.Extruded{Array{Float64,2},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(*),Tuple{Float64,Base.B...

(I can’t figure out how to format the error in a more clear way, sorry.)

Complete code below (very minor changes from Chris’ blog):

using DifferentialEquations, RecursiveArrayTools, LinearAlgebra

# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 256
const X = reshape([i for i in 1:N for j in 1:N],N,N)
const Y = reshape([j for i in 1:N for j in 1:N],N,N)
const α₁ = 1.0.*(X.>=4*N/5)

const Mx = Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0

# Define the initial condition as normal arrays
A = zeros(N,N); B  = zeros(N,N); C = zeros(N,N); u0 = ArrayPartition((A,B,C))
#u0 = zeros(Float64,N,N,3)

const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N);
# Define the discretized PDE as an ODE function
function f(du,u,p,t)
  A,B,C = u.x
  dA,dB,dC = du.x
  # A = @view  u[:,:,1]
  # B = @view  u[:,:,2]
  # C = @view  u[:,:,3]
  #dA = @view du[:,:,1]
  #dB = @view du[:,:,2]
  #dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

# Solve the ODE
prob = ODEProblem(f,u0,(0.0,100.0))
sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)

#using Plots; pyplot()
#p1 = surface(X,Y,sol[end][:,:,1],title = "[A]")
#p2 = surface(X,Y,sol[end][:,:,2],title = "[B]")
#p3 = surface(X,Y,sol[end][:,:,3],title = "[C]")
#plot(p1,p2,p3,layout=grid(3,1))

#######################################################
### Solve the PDE using CuArrays
#######################################################

using CuArrays
gA = CuArray(Float32.(A)); gB = CuArray(Float32.(B)); gC = CuArray(Float32.(C)); gu0 = ArrayPartition((gA,gB,gC))
#gu0 = CuArray(Float32.(u0))
const gMx = CuArray(Float32.(Mx))
const gMy = CuArray(Float32.(My))
const gα₁ = CuArray(Float32.(α₁))


const gMyA = CuArray(zeros(Float32,N,N))
const gAMx = CuArray(zeros(Float32,N,N))
const gDA = CuArray(zeros(Float32,N,N))

function gf(du,u,p,t)
  A,B,C = u.x
  dA,dB,dC = du.x
  # A = @view  u[:,:,1]
  # B = @view  u[:,:,2]
  # C = @view  u[:,:,3]
  #dA = @view du[:,:,1]
  #dB = @view du[:,:,2]
  #dC = @view du[:,:,3]
  mul!(gMyA,gMy,A)
  mul!(gAMx,A,gMx)
  @. DA = D*(gMyA + gAMx)
  @. dA = DA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

prob2 = ODEProblem(gf,gu0,(0.0,100.0))
#CuArray.allowslow(false) # makes sure none of the slow fallbacks are used
CuArrays.allowscalar(true) # so that the error generates
sol = solve(prob2,BS3(),save_everystep=false,save_start=false)

I would very much appreciate any help understanding why this doesn’t seem to work at all, since it’s been a headlining feature for the better part of a year.

ChrisRackauckas · February 3, 2020, 8:24pm

You forgot to change the cache for DA to the GPU. These two lines should be:

  @. gDA = D*(gMyA + gAMx)
  @. gdA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C

Then this script works. As for the allowscalar issue, I just put a patch in for RecursiveArrayTools: it was an ArrayPartition thing. Also, I now set this up on our GPU CI so it’ll be tracked.It should all be good to go.

Of course, I wouldn’t recommend implementing this exact problem like this, but it’ll work.

ChrisRackauckas · February 4, 2020, 7:54am

For reference, I just updated the blog post to now use the ROCK2 method which will be much faster than BS3 here. Cheers.

cmarcotte · February 4, 2020, 1:10pm

Thanks, Chris. Missing a device cache is a little embarrassing but I’m glad it was a simple fix.

In terms of legibility, is there a way to make the error message more descriptive for this sort of error?

cmarcotte · February 4, 2020, 1:23pm

Unfortunately v2.0.5 of RecursiveArrayTools yields the same error as before with ROCK2/ROCK4 (but BS3 works as expected), but thanks for having a look.

ChrisRackauckas · February 4, 2020, 2:42pm

What are you running? The code here works for me:

using OrdinaryDiffEq, RecursiveArrayTools, LinearAlgebra, Test, SparseArrays, SparseDiffTools, Sundials

# Define the constants for the PDE
const α₂ = 1.0
const α₃ = 1.0
const β₁ = 1.0
const β₂ = 1.0
const β₃ = 1.0
const r₁ = 1.0
const r₂ = 1.0
const D = 100.0
const γ₁ = 0.1
const γ₂ = 0.1
const γ₃ = 0.1
const N = 256
const X = reshape([i for i in 1:N for j in 1:N],N,N)
const Y = reshape([j for i in 1:N for j in 1:N],N,N)
const α₁ = 1.0.*(X.>=4*N/5)

const Mx = Tridiagonal([1.0 for i in 1:N-1],[-2.0 for i in 1:N],[1.0 for i in 1:N-1])
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0

# Define the initial condition as normal arrays
u0 = zeros(N,N,3)

const MyA = zeros(N,N);
const AMx = zeros(N,N);
const DA = zeros(N,N);
# Define the discretized PDE as an ODE function
function f(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

Iy = SparseMatrixCSC(I,N,N)
Ix = SparseMatrixCSC(I,N,N)
fJ = ones(3,3)
Dz = [1 0 0
      0 0 0
      0 0 0]
jacsparsity = kron(Dz,Iy,sparse(Mx)) + kron(Dz,sparse(My),Ix) + kron(fJ,Iy,Ix)
@time colorvec = matrix_colors(jacsparsity)

# Solve the ODE
ff = ODEFunction(f,colorvec=colorvec,jac_prototype=jacsparsity)
prob = ODEProblem(ff,u0,(0.0,100.0))
sol = solve(prob,BS3(),progress=true,save_everystep=false,save_start=false)
sol = solve(prob,ROCK2(),progress=true,save_everystep=false,save_start=false)
sol = solve(prob,TRBDF2(autodiff=false),progress=true,save_everystep=false,save_start=false)
@time sol = solve(prob,CVODE_BDF(linear_solver=:GMRES),progress=true,save_everystep=false)

println("CPU Times")
println("BS3")
@time sol = solve(prob,BS3(),progress=true,save_everystep=false)
println("ROCK2")
@time sol = solve(prob,ROCK2(),progress=true,save_everystep=false)
println("ROCK4")
@time sol = solve(prob,ROCK4(),progress=true,save_everystep=false)
println("TRBDF2")
@time sol = solve(prob,TRBDF2(autodiff=false),progress=true,save_everystep=false)
println("Rodas5")
@time sol = solve(prob,Rodas5(autodiff=false),progress=true,save_everystep=false)
println("CVODE_BDF")
@time sol = solve(prob,CVODE_BDF(linear_solver=:GMRES),progress=true,save_everystep=false)

using CuArrays
gu0 = CuArray(Float32.(u0))
const gMx = CuArray(Float32.(Mx))
const gMy = CuArray(Float32.(My))
const gα₁ = CuArray(Float32.(α₁))
const gMyA = CuArray(zeros(Float32,N,N))
const gAMx = CuArray(zeros(Float32,N,N))
const gDA = CuArray(zeros(Float32,N,N))

function gf(du,u,p,t)
   A = @view  u[:,:,1]
   B = @view  u[:,:,2]
   C = @view  u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(gMyA,gMy,A)
  mul!(gAMx,A,gMx)
  @. gDA = D*(gMyA + gAMx)
  @. dA = gDA + gα₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end

prob2 = ODEProblem(gf,gu0,(0.0,100.0))
CuArrays.allowscalar(false)
sol = solve(prob2,BS3(),save_everystep=false,save_start=false)
sol = solve(prob2,ROCK2(),save_everystep=false,save_start=false)
@test sol.t[end] == 100.0

println("GPU Times")
println("BS3")
@time sol = solve(prob2,BS3(),progress=true,save_everystep=false,save_start=false)
println("ROCK2")
@time sol = solve(prob2,ROCK2(),progress=true,save_everystep=false,save_start=false)

ChrisRackauckas · February 4, 2020, 2:53pm

Oh I see what you meant and I see where you got the issue from. I updated the blog post code and it should be good to go. Sorry for the issue! Cheers.

Topic		Replies	Views
DIfferentialEquations and GPU Performance gpu	15	4990	December 7, 2020
GPU problems with RecursiveArrayTools GPU question , gpuarrays , differentialequation	1	214	February 28, 2024
]test DiffEqGPU errors with " unsupported call to the Julia runtime" GPU first-steps	19	1618	September 20, 2019
Issue with 1.5 month old code based on CuArrays General Usage gpu	3	804	February 6, 2020
Error in diffeq_fd command of DiffEqFlux Machine Learning diffeq , flux	4	520	June 29, 2019

DifferentialEquations + RecursiveArrayTools Broadcasting / CuArrays Kernel compilation error --

Related topics