# KernelError: recursion is currently not supported

### Background:

I changed my task to a simple example for description:

I will take every element in array x, make it into a third-order matrix and calculate its determinant, and then assign the value of the determinant to array y.

$$x=\left[\begin{array}{c} {x_{1}} \\ {x_{2}} \\ {\vdots} \\ {x_{i}} \end{array}\right], \operatorname{mat}[i]=\left[\begin{array}{ccc} {x[i]+1} & {x[i]+2} & {x[i]+5} \\ {x[i]+1} & {x[i]+0} & {x[i]+4} \\ {x[i]+2} & {x[i]+3} & {x[i]+2} \end{array}\right], y=\left[\begin{array}{c} {\operatorname{mat}[1]} \\ {\operatorname{mat}[2]} \\ {\vdots} \\ {\operatorname{mat}[i]} \end{array}\right]$$

### My code:

using CuArrays
using CUDAnative
using LinearAlgebra

function test(x)
mat = [x+1 x+2 x+5
x+1 x+0 x+4
x+2 x+3 x+2]
c = det(mat)
return c
end

function kernel!(x,y)
index  = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
for i = index:stride:size(x,1)
y[i] = test(x[i])
end
return nothing
end

x = rand(10000)
y = zeros(10000)
d_x = cu(x)
d_y = cu(y)

numblocks     = ceil(Int, size(x, 1)/256)
@cuda threads = 256 blocks = numblocks kernel!(d_x,d_y)


### Result:

GPU compilation of kernel!(CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CuDeviceArray{Float32,1,CUDAnative.AS.Global}) failed
KernelError: recursion is currently not supported

Try inspecting the generated code with any of the @device_code_... macros.

Stacktrace:
[1] mapreduce_impl at reduce.jl:148 (repeats 2 times)
[2] det at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\LinearAlgebra\src\triangular.jl:2525
[3] det at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\LinearAlgebra\src\generic.jl:1421
[4] kernel! at In[3]:2

Stacktrace:
[1] (::CUDAnative.var"#hook_emit_function#100"{CUDAnative.CompilerJob,Array{Core.MethodInstance,1}})(::Core.MethodInstance, ::Core.CodeInfo, ::UInt64) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\irgen.jl:102
[2] compile_method_instance(::CUDAnative.CompilerJob, ::Core.MethodInstance, ::UInt64) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\irgen.jl:149
[3] macro expansion at C:\Users\zenan\.julia\packages\TimerOutputs\7Id5J\src\TimerOutput.jl:228 [inlined]
[4] irgen(::CUDAnative.CompilerJob, ::Core.MethodInstance, ::UInt64) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\irgen.jl:163
[5] macro expansion at C:\Users\zenan\.julia\packages\TimerOutputs\7Id5J\src\TimerOutput.jl:228 [inlined]
[6] macro expansion at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\driver.jl:99 [inlined]
[7] macro expansion at C:\Users\zenan\.julia\packages\TimerOutputs\7Id5J\src\TimerOutput.jl:228 [inlined]
[8] #codegen#156(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\driver.jl:98
[9] #codegen at .\none:0 [inlined]
[10] #compile#155(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\compiler\driver.jl:47
[11] #compile#154 at .\none:0 [inlined]
[12] #compile at .\none:0 [inlined] (repeats 2 times)
[13] macro expansion at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\execution.jl:392 [inlined]
[14] #cufunction#200(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction), ::typeof(kernel!), ::Type{Tuple{CuDeviceArray{Float32,1,CUDAnative.AS.Global},CuDeviceArray{Float32,1,CUDAnative.AS.Global}}}) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\execution.jl:359
[15] cufunction(::Function, ::Type) at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\execution.jl:359
[16] top-level scope at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\execution.jl:176
[17] top-level scope at gcutils.jl:91
[18] top-level scope at C:\Users\zenan\.julia\packages\CUDAnative\Phjco\src\execution.jl:173
[19] top-level scope at In[5]:2


### Question:

The above example is very similar to the task I actually want to complete. Because I am new in using GPU, I cannot understand the error reporting. Excuse me, how should this mistake be solved?

You ar calling det on the GPU, which in turn calls mapreduce. That kind of functionality is not available within a kernel, where you can only do relatively simple computations. The array allocation in test is also not possible in a kernel. Just hard-code the expression to calculate the determinant of your 3x3 matrix. Or you could try using StaticArrays, which you can allocate in a kernel (since it’s stack based), and it looks like they provide a det method.

2 Likes

I found this problem. What should I do if I need to create a variable mat in test(x), which will be a matrix of parameter x?