From CPU to GPU and back - compatible code for both

Dinari · December 27, 2018, 2:05pm

Hello all, Still new to Julia, Learning and coding as I go.

I am now working on some model, I would like to make it easy to configure to use in different settings:

Multi CPU (single machine), Multi CPU (several machines), but also Multi GPU’S on a single/multiple machines.

I would like to avoid code duplications as much as possible, if possible, to create a tuneable single version.

Is it possible with today supported GPU packages? or does it require a big overhead?

Thanks

mohamed82008 · December 27, 2018, 2:46pm

Hi, welcome to Julia!

I can only speak for my own experience. Firstly, it is obvious that each of the programming paradigms you mention has its intricacies that need to be respected and properly used in order to make the most out of each of them. For example, the way you would optimize a multi-threaded program is often not close to the way you would optimize a GPU program or a distributed program, beyond basic Julia gotchas.

However, the way Julia and multiple dispatch work allow for some sort of functional abstraction where one can define a new array type, e.g. CuArrays.CuArray or DistributedArrays.DArray and then define some common functions on these types to try to hide most of the implementation details entailed in GPU and distributed programming. For example, if your program can be written as a series of simple maps and map-reductions, then this is straightforward to support in Julia without code duplication as all these array types define map and mapreduce. Similarly, mul!, dot and some very basic linear algebra are supported by these array types. So all you have to do in this case is to make sure you don’t over-constrain the inputs of the function, possibly constraining them to ::AbstractArray which the above array types are sub-types of. One of the best examples of this is perhaps IterativeSolvers.cg! which works for all these array types because it only uses functions that have been defined for all the array types above.

However, if your code is more involved and cannot be written in terms of those defined functions only, then you will have to use dispatch to do your own magic. This can involve a fair bit of code duplication which can be reduced by a careful definition of your building block functions and macros to be reused in all implementations.

Perhaps as Julia grows, more of these functions and abstractions will be already defined for you, so your off-the-shelf options will grow. But at least for now AFAIK, if you want to do something somewhat complicated on the GPU and/or multiple machines, you may have to get your hands dirty with the details of each programming paradigm.

ChrisRackauckas · December 27, 2018, 8:06pm

Packages like DiffEq are made to work with GPUArrays without actually having any extra code for handling GPUs, so the overhead can be essentially zero even on large projects. You just have to use the right atomics.

Dinari · December 29, 2018, 11:43am

Thank you both, I can move all the ‘expensive’ calculations to arrays as you mentioned, this will leave very few things, which relative to the other are O(1), and GPU is not really needed for them (will probably even make them slower due to loading and reading from GPU memory), so I think I can make it work.

Topic		Replies	Views
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3320	December 20, 2018
Mapping functions using CuArrays GPU cuarrays	2	1900	September 9, 2020
Prioritising GPU Primitives from Vendor-Specialised Libraries GPU gpu , discussion , algorithm	1	393	December 15, 2022
Tutorial on GPU programming on julia GPU	5	5883	March 19, 2019
Package use, CUDA stream support, etc GPU first-steps	5	1459	September 13, 2018

From CPU to GPU and back - compatible code for both

Related topics