Factorial on GPU

I wanted to perform a factorial calculation within a kernel for implementing a statistical measure (Poisson likelihood) on GPU,
It turns out like factorial isn’t supported on GPU:

using CUDA

function gpu_fac(y, x)
    for i = 1:length(y)
        @inbounds y[i] += factorial(x[i])
    return nothing

N = 10
x = CUDA.fill(3, N)
y = CUDA.fill(1, N)

@cuda gpu_fac(y, x)

ERROR: LoadError: InvalidIRError: compiling kernel gpu_fac(CuDeviceVector{Int64, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR

Would it makes sense to have factorial support on the GPU?

not really, for integers, there are less than 20 numbers that fits in the range (depending on if you pick Int32 or Int64). Besides, 20 scalar multiplication is definitely not worth going to GPU by any means.

(if you’re thinking using floating numbers, you’re also out of luck:

julia> reduce(*, 1:1.0:25) |> BigInt

julia> factorial(BigInt(25))
1 Like