Do a function like relu need a kernel ? When you need to write a GPU kernel rather than "just" using CuArray?

The relu function works elementwise by returning the input that is non-negative, i.e. relu(x) = ifelse.(x .> 0, x, 0).

Does the fact that it works elementwise means I need to write a GPU kernel for it, or I can simply use CUArrays on it ?
Or perhaps I need to convert it to relu!(y,x) = begin y .= ifelse.(x .> 0, x, 0); return nothing end ?

Also, if I am writing a package and I have no idea if the user has a CPU or a specific GPU, how can I write code that works independently of the hardware, such that the user may have data in a standard Array, a CuArray, a ROCArray, oneArray or MtlArray… and she just calls the function (my function) and the computation is done on the appropriate hardware ?

1 Like

No answer, just a follow-up question: is ifelse.(x .> 0, x, 0) better than max.(x, 0)?

A few years ago I implemented CUDA inference code for some deep learning layers and relu was just max.(x, 0).

1 Like

The CUDA.jl have some useful docs on this; writing it in terms of broadcasting is array programming which uses the gpu (when the input is a CuArray) and if you can express the operation in terms of operations like that then you don’t need to write a kernel. So broadcasting is a good way to go (and supports other accelerators than CUDA, GPUArrays is the generic package I believe). If you do end up needing to write kernels and want to do so in a way that is generic, then KernelAbstractions.jl is the package for that.

2 Likes