[blog post] Introduction to GPU programming

You could actually read the blogpost, where I explain exactly that in the section Writing GPU Kernels :stuck_out_tongue:

using GPUArrays, CuArrays
# Overloading the Julia Base map! function for GPUArrays
function Base.map!(f::Function, A::GPUArray, B::GPUArray)
    # our function that will run on the gpu
    function kernel(state, f, A, B)
        # If launch parameters aren't specified, linear_index gets the index
        # into the Array passed as second argument to gpu_call (`A`)
        i = linear_index(state)
    		if i <= length(A)
          @inbounds A[i] = f(B[i])
        end
        return
    end
    # call kernel on the gpu
    gpu_call(kernel, A, (f, A, B))
end

Let’s try to figure out what this is doing! In simple terms, this will call the julia function kernel length(A) times in parallel on the GPU. Each parallel invocation of kernel has a thread index, which we can use to safely index into the arrays A and B. If we calculated our own indices instead of using linear_index, we’d need to make sure that we don’t have multiple threads reading and writing to the same array locations. So, if we wrote this in pure Julia with threads, an equivalent version would look like this: