Invalid LLVM IR error using CUDA

Hi all. I have encountered a strange problem and would like to ask you a question.
While cpu function works, gpu function fails.
Why? Can I do such override in CUDA kernel?

It seems that cuda is confusing Int32 and Int64.

Thank you in advance.

using CUDA

import Base: +

struct Test{N} end
Test(N) = Test{N}()
+(::Test{M}, ::Test{N}) where {M,N} = Test{M + N}()

function test_gpu()
    i = threadIdx().x
    Test(i) + Test(4)
    return nothing
end

CUDA.@cuda test_gpu()

function test_cpu()
    i = 1
    Test(i) + Test(4)
end

test_cpu()
ERROR: InvalidIRError: compiling MethodInstance for test() resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Reason: unsupported call to an unknown function (call to julia.get_gc_frame_slot)
Reason: unsupported call to an unknown function (call to jl_f_apply_type)
Stacktrace:
 [1] Test
   @ ~/dev/MR/sme.jl:6
 [2] test
   @ ~/dev/MR/sme.jl:11
Reason: unsupported call to an unknown function (call to ijl_new_structv)
Stacktrace:
 [1] Test
   @ ~/dev/MR/sme.jl:5
 [2] Test
   @ ~/dev/MR/sme.jl:6
 [3] test
   @ ~/dev/MR/sme.jl:11
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] test
   @ ~/dev/MR/sme.jl:11
Reason: unsupported call to an unknown function (call to julia.pop_gc_frame)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/validation.jl:147
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:445 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:444 [inlined]
  [5] 
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
  [7] 
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:134
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
  [9] 
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
 [10] compile
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
 [11] #1145
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:254 [inlined]
 [12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{…}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
 [13] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:253
 [15] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
 [16] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
 [17] macro expansion
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:369 [inlined]
 [18] macro expansion
    @ ./lock.jl:267 [inlined]
 [19] cufunction(f::typeof(test), tt::Type{Tuple{}}; kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:364
 [20] cufunction(f::typeof(test), tt::Type{Tuple{}})
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:361
 [21] top-level scope
    @ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:112
Some type information was truncated. Use `show(err)` to see complete types.
test_gpu() @ Main ~/dev/MR/sme.jl:10
10 function test_gpu()::Core.Const(nothing)
11     i::Int32 = threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}.x::Int32
12     Test(i::Int32)::Test + Test(4)::Core.Const(Test{4}())
13     return nothing
14 end
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [v]scode: inlay types, [V]scode: diagnostics.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
   threadIdx()
   threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}.x
   Test(i::Int32)
   Test(4)
 • %5 = +(::Test,::Test{4})::Any
   ↩
+(::Test{M}, ::Test{N}) where {M, N} @ Main ~/dev/MR/sme.jl:8
8 (+(::(Test{M})::Test, ::(Test{N})::Core.Const(Test{4}())) where {M,N})::Test = Test{(M + N::Core.Const(4))::Any}::Type{Test{_A}} where _A()
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [v]scode: inlay types, [V]scode: diagnostics.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
   runtime M + N::Core.Const(4)
 • Test{(M + N::Core.Const(4))::Any}::Type{Test{_A}} where _A()

Your problem is that the type of Test(i) is not inferrable: it depends on the runtime value i. GPU code must be fully inferrable.

You may be able to use Adapt.jl as suggested in Using custom structs · CUDA.jl .

1 Like

Thank you. I understand. I am also trying adapt.