Some help with ArrayFire + DifferentialEquations

Noel_Araujo · December 16, 2019, 4:48pm

Hellos,

I noticed that with one line definition for ArrayFire.jl I can make it run together with DifferentialEquations.jl, but it is not working as expected.

DifferentialEquations is calling a generic multiplication function generic_matvecmul! instead of using ArrayFire *(matmul) and I would like to know how to fix it (if possible) to get all the performance of the GPU.

Here a minimal example of my problem:

# Don't worry about this part
# It is necessary to define the matrix "M"  below
function _sph2cart(θ, ϕ, r)
    N = length(θ)
    x = Array{Float64}(undef, N)
    y = Array{Float64}(undef, N)
    z = Array{Float64}(undef, N)

    x = r*cos(θ)*cos(ϕ)
    y = r*sin(θ)*cos(ϕ)
    z = r*sin(ϕ)
    return x,y,z
end

function _RandPoints_Sphere(N::Int64, Radius, MinDist )
    X = Array{Float64}(undef, N)
    Y = Array{Float64}(undef, N)
    Z = Array{Float64}(undef, N)
    dist_3 = MinDist^3
    iLoop = 1
    nValid = 0
    while nValid < N && iLoop < 10^6
        rvals = 2*rand() - 1
        elevation = asin(rvals)
        azimuth = 2π*rand()
        radii = Radius*(rand()^(1 ./3.))

        newX, newY, newZ = _sph2cart(azimuth, elevation, radii)
        if all(( (X[1:nValid].- newX).^2 + (Y[1:nValid].- newY).^2 + (Z[1:nValid].- newZ).^2 ) .>= dist_3)
            nValid = nValid + 1
            X[nValid] = newX
            Y[nValid] = newY
            Z[nValid] = newZ
        end
        iLoop = iLoop + 1
    end
    if nValid < N
        error("Could not generate all data after 10M interactions")
    end

    X_jk = X*ones(1,N) - ones(N,1)*X';
    Y_jk = Y*ones(1,N) - ones(N,1)*Y';
    Z_jk = Z*ones(1,N) - ones(N,1)*Z';
    R_jk = sqrt.(X_jk.^2 + Y_jk.^2 + Z_jk.^2);
    X_jk = 1;Y_jk=1; Z_jk=1; GC.gc()
    return [X Y Z],R_jk
end
# --------------------------------------------


using ArrayFire, BenchmarkTools, DifferentialEquations, LinearAlgebra

# We need to define any()  with AFArray
Base.any(f::Function, a::AFArray) = Bool(sum(f.(a)))

# Minimal code for what I want to do
function Foo!(du,u,p,t)
	M = p[1]
	Ω = p[2]
	du[:] = M*u + Ω
end

# just another attempt
Goo(u, p,t) = M_gpu*u + Ω_gpu


N = 10 # matrix size

r, R_jk = _RandPoints_Sphere(N, 10, 0 )
M = Array{Complex{Float64}}(undef, N,N)
@. M = -(1/2)*exp(im*R_jk)/(im*R_jk)
M[diagind(M)] .= 1im*1 - 1/2
Ω = -(im/2)*exp.(1im.*r[:,3]) 

M_gpu = AFArray(M)
Ω_gpu = AFArray(Ω)

p = []
push!(p, M)
push!(p, Ω)
p_gpu = []
push!(p_gpu, M_gpu)
push!(p_gpu, Ω_gpu)


println("----")
u0 = zeros(ComplexF64,length(Ω))
tspan = (0.0,100.0)
prob = ODEProblem(Foo!,u0,tspan,p)
@btime sol = DifferentialEquations.solve(prob,Tsit5())


u0_gpu = AFArray(zeros(ComplexF64,length(Ω)))
tspan = (0.0,100.0)
prob_gpu1 = ODEProblem(Foo!,u0,tspan, p_gpu)
@btime sol_gpu1 = DifferentialEquations.solve(prob_gpu1,Tsit5())


u0_gpu = AFArray(zeros(ComplexF64,length(Ω)))
tspan = (0.0,100.0)
prob_gpu2 = ODEProblem(Goo,u0,tspan)
@btime sol_gpu2 = DifferentialEquations.solve(prob_gpu2,Tsit5())

Results:


ArrayFire v3.6.4 (OpenCL, 64-bit Linux, build 1b8030c)
[0] AMD: gfx900, 8176 MB
----
  158.905 μs (1412 allocations: 293.16 KiB)
  830.935 ms (91112 allocations: 5.08 MiB)
  825.654 ms (99245 allocations: 5.99 MiB)

ChrisRackauckas · December 16, 2019, 4:54pm

ArrayFire has a memory pool where, instead of mutating them, it’s usually better to create new ones.

But anyways, your problem seems to be a typo:

You didn’t use u0_gpu, and instead had it on the CPU.

Also, you might want to use 32-bit Floats.

Noel_Araujo · December 16, 2019, 5:06pm

(edit)
Using F32 and fixing my typo, I got StackOverflowError using the Foo!

and Input type mismatch with Goo

ChrisRackauckas · December 16, 2019, 5:41pm

What are your input types into that operation? Print out typeof.

Noel_Araujo · December 16, 2019, 5:59pm

To print the type:

function Goo(u, p,t)
        println("M_gpu: ",typeof(M_gpu))
	println("u    : ",typeof(u))
	println("Ω_gpu: ",typeof(Ω_gpu))
	return M_gpu*u + Ω_gpu
end

and I run the code:

M_gpu: AFArray{Complex{Float32},2}
u    : AFArray{Complex{Float32},1}
Ω_gpu: AFArray{Complex{Float32},1}
M_gpu: AFArray{Complex{Float32},2}
u    : AFArray{Complex{Float32},1}
Ω_gpu: AFArray{Complex{Float32},1}
M_gpu: AFArray{Complex{Float32},2}
u    : AFArray{Complex{Float64},1}
Ω_gpu: AFArray{Complex{Float32},1}

at some moment, the type of u changes and creates the issue.

Converting everything again for ComplexFloat64 (and still using the Goo code) produces the same StackOverflowError as before with Foo!.

ChrisRackauckas · December 16, 2019, 6:11pm

I don’t see a point where in does in your printout?

Noel_Araujo · December 16, 2019, 6:16pm

In the last 3 printlns, the type of u becomes Float64, and before it was Float32

M_gpu: AFArray{Complex{Float32},2}
u    : AFArray{Complex{Float**64**},1}
Ω_gpu: AFArray{Complex{Float32},1}

ChrisRackauckas · December 16, 2019, 6:24pm

Make time Float32, i.e. tspan = (0f0,100f0)

Noel_Araujo · December 16, 2019, 6:42pm

Doing tspan with Float32 and rebooting my PC to make sure that nothing strange was happening, I still get the same StackOverflowError

ChrisRackauckas · December 16, 2019, 6:50pm

Whereat?

Noel_Araujo · December 16, 2019, 6:55pm

ChrisRackauckas · December 16, 2019, 7:02pm

See what happens if you make abstol=1f-6,reltol=1f-3.

Noel_Araujo · December 16, 2019, 7:25pm

I tested all the Solvers: RK4() I got the same error of type mismatch because u changes to Float64. Most all the others Explicity Runge-Kutta I got StackOverflowError, however the the simplest Euler() with dt=(1/2)^5 works.

ChrisRackauckas · December 16, 2019, 7:31pm

It looks to be something in adaptive time stepping. What’s your current code? I may be able to run it soon

Noel_Araujo · December 16, 2019, 7:36pm

using ArrayFire, BenchmarkTools, LinearAlgebra
# We need to define any()  wit AFArray
Base.any(f::Function, a::AFArray) = Bool(sum(f.(a)))

using DifferentialEquations
function Foo!(du,u,p,t)
	M = p[1]
	Ω = p[2]
	du[:] = M*u + Ω
end

function Goo(u, p, t)
	# println("M_gpu: ",typeof(M_gpu))
	# println("u    : ",typeof(u))
	# println("Ω_gpu: ",typeof(Ω_gpu))
	return M_gpu*u + Ω_gpu
end



N = 10
# Create Matrix for this example
r, R_jk = _RandPoints_Sphere(N, 10, 0 )
M = Array{Complex{Float32}}(undef, N,N)
@. M = -(1/2)*exp(im*R_jk)/(im*R_jk)
M[diagind(M)] .= 1im*1 - 1/2
Ω = ComplexF32.(-(im/2)*exp.(im.*r[:,3]))

M_gpu = AFArray(M)
Ω_gpu = AFArray(Ω)

p = []
push!(p, M)
push!(p, Ω)

println("----")
u0 = zeros(ComplexF32,length(Ω))
tspan = (0f0,100f0)
prob = ODEProblem(Foo!,u0,tspan,p)
@time sol = DifferentialEquations.solve(prob,Tsit5())
 

u0_gpu = AFArray(zeros(ComplexF32,length(Ω)))
tspan = (0f0,100f0)
prob_gpu2 = ODEProblem(Goo,u0_gpu,tspan)
@time sol_gpu2 = DifferentialEquations.solve(prob_gpu2, Euler(), dt=(1/2)^5)

I got warnings that I cannot see, but looks like both results (cpu and gpu) are working as expected ( to see this I need further codes, and they are not relevant here)


ArrayFire v3.6.4 (OpenCL, 64-bit Linux, build 1b8030c)
[0] AMD: gfx900, 8176 MB
----
  3.061343 seconds (13.05 M allocations: 624.245 MiB, 5.90% gc time)
3 warnings generated.
  1.506639 seconds (2.33 M allocations: 119.601 MiB, 1.70% gc time)

ChrisRackauckas · December 17, 2019, 9:54pm

Yeah that stack overflow…

https://github.com/JuliaGPU/ArrayFire.jl/blob/master/src/array.jl#L217-L225

That broadcast implementation is really not a good idea. I’m surprised any code works. The fix should be:

using DiffEqBase
@inline function DiffEqBase.calculate_residuals(ũ::AFArray, u₀::AFArray, u₁::AFArray,
                                    α, ρ, internalnorm,t)
    @. ũ / (α + max(norm(u₀,t), norm(u₁,t)) * ρ)
end

but, because their broadcast implementation doesn’t really work, it breaks the fusion of the expression:

MethodError: no method matching /(::AFArray{Complex{Float32},1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(+),Tuple{Float32,Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(max),Tuple{Float32,Float32}},Float32}}}})
Closest candidates are:
  /(::AFArray, !Matched::Number) at C:\Users\accou\.julia\packages\ArrayFire\nkN6J\src\array.jl:125
  /(::AbstractArray, !Matched::Number) at arraymath.jl:55
  /(::AbstractArray{T,1} where T, !Matched::Union{UnitLowerTriangular, UnitUpperTriangular}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\LinearAlgebra\src\triangular.jl:1948
  ...
broadcasted(::Function, ::AFArray{Complex{Float32},1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(+),Tuple{Float32,Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(max),Tuple{Float32,Float32}},Float32}}}}) at array.jl:220
calculate_residuals(::AFArray{Complex{Float32},1}, ::AFArray{Complex{Float32},1}, ::AFArray{Complex{Float32},1}, ::Float32, ::Float32, ::Function, ::Float32) at test.jl:158
perform_step!(::OrdinaryDiffEq.ODEIntegrator{Tsit5,false,AFArray{Complex{Float32},1},Float32,DiffEqBase.NullParameters,Float32,Float32,Float32,Array{AFArray{Complex{Float32},1},1},ODESolution{Complex{Float32},2,Array{AFArray{Complex{Float32},1},1},Nothing,Nothing,Array{Float32,1},Array{Array{AFArray{Complex{Float32},1},1},1},ODEProblem{AFArray{Complex{Float32},1},Tuple{Float32,Float32},false,DiffEqBase.NullParameters,ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,OrdinaryDiffEq.InterpolationData{ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Array{AFArray{Complex{Float32},1},1},Array{Float32,1},Array{Array{AFArray{Complex{Float32},1},1},1},OrdinaryDiffEq.Tsit5ConstantCache{Complex{Float32},Float32}},DiffEqBase.DEStats},ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},OrdinaryDiffEq.Tsit5ConstantCache{Complex{Float32},Float32},OrdinaryDiffEq.DEOptions{Float32,Float32,Float32,Float32,typeof(DiffEqBase.ODE_DEFAULT_NORM),typeof(opnorm),CallbackSet{Tuple{},Tuple{}},typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN),typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE),typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK),DataStructures.BinaryHeap{Float32,DataStructures.LessThan},DataStructures.BinaryHeap{Float32,DataStructures.LessThan},Nothing,Nothing,Int64,Array{Float32,1},Array{Float32,1},Array{Float32,1}},AFArray{Complex{Float32},1},Complex{Float32},Nothing}, ::OrdinaryDiffEq.Tsit5ConstantCache{Complex{Float32},Float32}, ::Bool) at low_order_rk_perform_step.jl:598
perform_step! at low_order_rk_perform_step.jl:578 [inlined]
solve!(::OrdinaryDiffEq.ODEIntegrator{Tsit5,false,AFArray{Complex{Float32},1},Float32,DiffEqBase.NullParameters,Float32,Float32,Float32,Array{AFArray{Complex{Float32},1},1},ODESolution{Complex{Float32},2,Array{AFArray{Complex{Float32},1},1},Nothing,Nothing,Array{Float32,1},Array{Array{AFArray{Complex{Float32},1},1},1},ODEProblem{AFArray{Complex{Float32},1},Tuple{Float32,Float32},false,DiffEqBase.NullParameters,ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,OrdinaryDiffEq.InterpolationData{ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Array{AFArray{Complex{Float32},1},1},Array{Float32,1},Array{Array{AFArray{Complex{Float32},1},1},1},OrdinaryDiffEq.Tsit5ConstantCache{Complex{Float32},Float32}},DiffEqBase.DEStats},ODEFunction{false,typeof(Goo),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},OrdinaryDiffEq.Tsit5ConstantCache{Complex{Float32},Float32},OrdinaryDiffEq.DEOptions{Float32,Float32,Float32,Float32,typeof(DiffEqBase.ODE_DEFAULT_NORM),typeof(opnorm),CallbackSet{Tuple{},Tuple{}},typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN),typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE),typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK),DataStructures.BinaryHeap{Float32,DataStructures.LessThan},DataStructures.BinaryHeap{Float32,DataStructures.LessThan},Nothing,Nothing,Int64,Array{Float32,1},Array{Float32,1},Array{Float32,1}},AFArray{Complex{Float32},1},Complex{Float32},Nothing}) at solve.jl:373
#__solve#331 at solve.jl:5 [inlined]
__solve at solve.jl:4 [inlined]
#solve_call#442(::Base.Iterators.Pairs{Union{},Union{},Tuple{},Na...

I would open an issue that it’s not possible to do the operation

t = 1f0
ρ = 1f0
α = 1f0
u₀ = AFArray(rand(ComplexF32,length(Ω)))
u₁ = AFArray(rand(ComplexF32,length(Ω)))
ũ =  AFArray(rand(ComplexF32,length(Ω)))
@. ũ / (α + max(norm(u₀,t), norm(u₁,t)) * ρ)

with ArrayFire.jl arrays.

ChrisRackauckas · December 17, 2019, 9:54pm

BTW, is there any reason why you’re using ArrayFire instead of CuArrays?

jpsamaroo · December 18, 2019, 12:18am

They’re using an AMD GPU. Hopefully soon ROCArrays will work well enough to run their code (if it doesn’t already).

ChrisRackauckas · December 18, 2019, 12:40am

I see. Okay, so we should probably get the ArrayFire broadcast implementation fixed anyways. What it’s trying to do is make sure it calls the right kernels, but it doesn’t actually turn off fusion. It needs to do this kind of thing:

https://github.com/JuliaLang/julia/blob/master/base/broadcast.jl#L1026-L1082

on *, norm, etc.

Noel_Araujo · December 18, 2019, 1:43pm

As stated before, I have a AMD GPU and ArrayFire is enough for me.

Topic		Replies	Views
DifferentialEquations + RecursiveArrayTools Broadcasting / CuArrays Kernel compilation error -- GPU diffeq	6	862	February 4, 2020
]test DiffEqGPU errors with " unsupported call to the Julia runtime" GPU first-steps	19	1611	September 20, 2019
Diffusion by using CUDA.jl, DifferentialEquations and DiffEqGPU GPU question , diffeq	6	1234	March 29, 2022
DiffEqGPU throws InvalidIR Error for simple example General Usage diffeq , differentialequation	3	329	February 14, 2023
Error in diffeq_fd command of DiffEqFlux Machine Learning diffeq , flux	4	520	June 29, 2019

Some help with ArrayFire + DifferentialEquations

Related topics