I wanted to perform a factorial calculation within a kernel for implementing a statistical measure (Poisson likelihood) on GPU,
It turns out like factorial
isn’t supported on GPU:
using CUDA
function gpu_fac(y, x)
for i = 1:length(y)
@inbounds y[i] += factorial(x[i])
end
return nothing
end
N = 10
x = CUDA.fill(3, N)
y = CUDA.fill(1, N)
@cuda gpu_fac(y, x)
ERROR: LoadError: InvalidIRError: compiling kernel gpu_fac(CuDeviceVector{Int64, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Would it makes sense to have factorial
support on the GPU?