Functions of GPUArrays in ODEs and scheduling on GPU

Gregstrq · January 13, 2021, 7:57pm

I have a totally noob question here.

I know that there are GPUArrays for which broadcasting and various linear algebra stuff is overloaded, so it gets executed on GPU.

Suppose, that I define a function which works with GPUArrays and may be with some constants. When this function is compiled, does it gets compiled to GPU as a whole, so that it can be executed on GPU without data exchange with CPU? (Provided that I transferred data to GPU beforehand)

If I plug such a function into DifferentialEquations.jl as a right-hand-side of ODE, and integrate it with 4-th order Runge Kutta, for example, will the whole Runge Kutta with the right hand side be compiled on GPU, so that it can be executed as a whole without the data exchange with CPU?

ChrisRackauckas · January 13, 2021, 9:30pm

Yes. Try it. And if we need to make this more clear in the documentation, let me know where.

Gregstrq · January 13, 2021, 9:45pm

Cool.

Can you explain how it actually works?
I understand how dispatch works when you call, for example, matrix vector product function, which is overloaded to a GPU implementation.
But how does Julia understand that it needs to compile the whole function to GPU, when it tries to compile a generic function using GPUArrays?

ChrisRackauckas · January 13, 2021, 9:47pm

It just does this recursively.

Gregstrq · January 13, 2021, 10:00pm

I am not sure, but it seems that such a fit requires to

determine that there are GPUArrays present.
determine part of the syntax tree whidh should be compiled for GPU
transform this part of syntax tree into something suitable for GPU
actually compile it

Basically, it seems to require the transfromations of the parsed syntax tree of the code. For me, it is on the level of wizardry.

jpsamaroo · January 13, 2021, 10:30pm

When you use a GPUArrays implementation such as the CuArray, various array operations are defined that will ensure that the operation dispatches onto the GPU. Necessary conversions and kernel compilations happen automatically as part of the support code in GPUArrays. Many semi-generic kernels are also available within GPUArrays for common operations, which are used by GPUArrays and CUDA.jl to implement efficient GPU variants of well known CPU-only operations from Base, stdlibs, and external packages.

Of course, not everything works seamlessly with the CuArray; pretty much only the dispatches defined in GPUArrays and CUDA (and also other packages) will work, and many other dispatches will not (often resulting in an error). It’s not magic, it’s just liberal use of multiple dispatch and a few hand-rolled routines. The most magical thing in the stack is probably GPUCompiler.jl, which compiles a given kernel (basically, a function with a given set of argument types) down to native GPU assembly by leveraging LLVM and Julia’s compiler.

jpsamaroo · January 13, 2021, 10:33pm

Also, regarding doing AST transformations: CUDA.jl and GPUCompiler.jl mostly do not do AST transforms (although CUDA and GPUArrays do have custom broadcasting implementations which does transform a broadcast tree to a certain depth, such as replacing Base.sin with CUDA.sin automatically), but KernelAbstractions.jl (a package for writing generic kernels that run on the CPU, CUDA GPUs, and soon AMD and Intel GPUs) does use Cassette.jl to do transformations that are akin to AST transforms.

Gregstrq · January 13, 2021, 11:00pm

Will it work if I have a callable struct which wraps CuArrays (some of these CuArrays are constant precomputed matrices and vectors, some off these CuArrays are used as caches for the computation)?

jpsamaroo · January 14, 2021, 3:37pm

It doesn’t currently seem to work (see below), but generally we do now cudaconvert the kernel function in an attempt to convert closures to something GPU-compatible (see https://github.com/JuliaGPU/CUDA.jl/pull/625). I’d personally say it’s probably something worth supporting if possible.

julia> using CUDA

julia> struct MyStruct
       x
       end

julia> m = MyStruct(CuArray(rand(1)))
MyStruct([0.6895091553888575])

julia> function (m::MyStruct)(y)
       m.x[1] = y
       nothing
       end

julia> @cuda m(3.0)
ERROR: MethodError: no method matching cufunction(::MyStruct, ::Type{Tuple{Float64}})
Closest candidates are:
  cufunction(::F, ::TT; name, kwargs...) where {F<:Function, TT<:Type} at /home/jpsamaroo/.julia/packages/CUDA/UeDEJ/src/compiler/execution.jl:285
Stacktrace:
 [1] top-level scope
   @ ~/.julia/packages/CUDA/UeDEJ/src/compiler/execution.jl:101

@maleadt is this something that should work?

maleadt · January 14, 2021, 4:27pm

I don’t think we can get this to work generally, because the typevars won’t necessarily match the fields as we currently assume they do: https://github.com/JuliaGPU/Adapt.jl/blob/c7dad7faed4cf95aaab2ff986cbcb90cecf7f453/src/base.jl#L19-L20
Here the struct isn’t parameterized so it wouldn’t be GPU compatible anyway.

Gregstrq · January 14, 2021, 5:11pm

I suppose the users can write adapters for their custom types.

Is there a guide how to do this?

maleadt · January 15, 2021, 11:31am

The Adapt.jl README should get you started. You can find a bunch of adapters in there, or in packages like CUDA.jl.

Topic		Replies	Views
From CPU to GPU and back - compatible code for both New to Julia	3	764	December 29, 2018
Broadcasting an expression-evaluator function GPU	2	498	August 2, 2022
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3325	December 20, 2018
GPU Map without reduction on multiple arrays indices GPU	1	693	February 8, 2019
Use GPU subfunction in a bigger-function? New to Julia cudanative , cuarrays	5	754	April 19, 2020

Functions of GPUArrays in ODEs and scheduling on GPU

Related topics