Using Math Functions Inside CUDA Kernel

parallel

#1
using CUDAdrv, CUDAnative

function kernel_a(a)
    i = (blockIdx().x-1) * blockDim().x + threadIdx().x
    a[i] = tanh(a[i])
    return nothing
end

dev = CuDevice(0)
ctx = CuContext(dev)
len = 100
a = rand(Float32, len)
d_a = CuArray(a)
function gpu()
    @cuda (1,100) kernel_a(d_a)
end
gpu()

The above code gives me “error in running finalizer: CUDAdrv.CuError(code=700, meta=nothing)” and various other errors if I try to use a math function such as tanh() inside kernel.

Is this the right way to use math functions?


#2

Use intrinsics from CUDAnative. See https://github.com/JuliaGPU/CUDAnative.jl/issues/27
Should, hopefully, be fixed for Julia 1.0