How does GPU programming work (Knet example)?

jaynick · January 1, 2020, 4:52am

Can anyone give a high-level summary of how julia and GPU programming work? In what cases can we write julia code and have it run on the GPU, and in what cases will it not work? I am confused.

I needed to clean an image computed with knet to sit in 0-1 range.
There are some existing clamp calls but for some reason they did
not work with the KnetArray datatype. I expected to be able to write some new code and have it work on the GPU since the image data is a KnetArray. I wrote this:

  function clamp!(img)
    res = size(img)
    len = reduce(*,res)
    @inbounds for i=1:len
        v = img[i]
        if v > 1.f0   
            img[i] = 1.f0 
        elseif v < 0.f0
            img[i] = 0.f0
        end
    end
 end

It worked, but took about 20 seconds for a 300^2 image,
whereas one of the existing clamp calls takes a fraction of a second
when applied to a Array{Float32}. This is very slow: so slow
that I suspect data is being pulled to the CPU to do the operation
and then pushed back?

I thought that julia has a compiler that outputs GPU code,
But in looking at the source for Knet’s conv4 funtion,
it actually calls cudnn.

The situation confuses me now. Can anyone give a simple explanation of what works and does not work in terms of having julia programs run on the GPU? “simple”, meaning, for someone who does not knows about how compilers work.

ChrisRackauckas · January 1, 2020, 10:53am

When you hear people talking about running Julia code on GPUs, it’s with CUDAnative and CuArrays. I don’t think that’s compatible with KnetArrays, which are an internal datatype to KNet and utilize hardcoded CUDA kernels written in CUDA C++. You can see from the source what it’s creating:

https://github.com/denizyuret/Knet.jl/blob/master/deps/cuda14.jl

While this could in theory be interopable with the “standard” Julia CUDA stack, I am not sure if there’s an easy way to do it.

jaynick · January 2, 2020, 4:00am

Thank you.

Two more follow-up question:

Does Flux use CuArrays?
For someone who knows Knet, is there a “rule” about what julia programs working with KnetARrays will produce good gpu code and which no?

ChrisRackauckas · January 2, 2020, 4:30am

Yes.

denizyuret · January 4, 2020, 8:36am

As Chris mentioned, Knet uses a combination of handcrafted kernels and cuda library calls to implement most of Julia’s array interface. As a general rule of thumb built-in functions that work on the whole array (with or without broadcasting) should work on a KnetArray (e.g. tanh.(a) or a .+ b or norm(a) etc.). Anything that requires a for loop that goes through array elements will need either a custom cuda kernel or be expressed in terms of other array functions.

Finally, you don’t have to use KnetArray with Knet, you can use CuArrays with a slight performance penalty: none of the model building, training, optimization, gradient etc code in Knet is (or should be :)) KnetArray specific.

jaynick · January 6, 2020, 5:37am

That last fact is very interesting (that CuArrays can be also used).
I will see if that can speed up the code that I mentioned.

Topic		Replies	Views
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3313	December 20, 2018
Reading from a KNetArray within a GPU kernel function GPU question , knet	0	495	March 9, 2020
One example from `GPU programming in Julia \| Workshop \| JuliaCon 2021` GPU question , gpu	0	364	April 5, 2022
Running For loops on GPU GPU first-steps	11	6516	July 19, 2021
Problem with GPU programming GPU cudanative , cuda	4	1059	September 13, 2019

How does GPU programming work (Knet example)?

Related topics