Hi all. I have encountered a strange problem and would like to ask you a question.
While cpu function works, gpu function fails.
Why? Can I do such override in CUDA kernel?
It seems that cuda is confusing Int32 and Int64.
Thank you in advance.
using CUDA
import Base: +
struct Test{N} end
Test(N) = Test{N}()
+(::Test{M}, ::Test{N}) where {M,N} = Test{M + N}()
function test_gpu()
i = threadIdx().x
Test(i) + Test(4)
return nothing
end
CUDA.@cuda test_gpu()
function test_cpu()
i = 1
Test(i) + Test(4)
end
test_cpu()
ERROR: InvalidIRError: compiling MethodInstance for test() resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Reason: unsupported call to an unknown function (call to julia.push_gc_frame)
Reason: unsupported call to an unknown function (call to julia.get_gc_frame_slot)
Reason: unsupported call to an unknown function (call to jl_f_apply_type)
Stacktrace:
[1] Test
@ ~/dev/MR/sme.jl:6
[2] test
@ ~/dev/MR/sme.jl:11
Reason: unsupported call to an unknown function (call to ijl_new_structv)
Stacktrace:
[1] Test
@ ~/dev/MR/sme.jl:5
[2] Test
@ ~/dev/MR/sme.jl:6
[3] test
@ ~/dev/MR/sme.jl:11
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
[1] test
@ ~/dev/MR/sme.jl:11
Reason: unsupported call to an unknown function (call to julia.pop_gc_frame)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/validation.jl:147
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:445 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:444 [inlined]
[5]
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
[6] emit_llvm
@ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
[7]
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:134
[8] codegen
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
[9]
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
[10] compile
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
[11] #1145
@ ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:254 [inlined]
[12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{…}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[13] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[14] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/compilation.jl:253
[15] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
[16] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
[17] macro expansion
@ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:369 [inlined]
[18] macro expansion
@ ./lock.jl:267 [inlined]
[19] cufunction(f::typeof(test), tt::Type{Tuple{}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:364
[20] cufunction(f::typeof(test), tt::Type{Tuple{}})
@ CUDA ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:361
[21] top-level scope
@ ~/.julia/packages/CUDA/Tl08O/src/compiler/execution.jl:112
Some type information was truncated. Use `show(err)` to see complete types.
test_gpu() @ Main ~/dev/MR/sme.jl:10
10 function test_gpu()::Core.Const(nothing)
11 i::Int32 = threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}.x::Int32
12 Test(i::Int32)::Test + Test(4)::Core.Const(Test{4}())
13 return nothing
14 end
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [v]scode: inlay types, [V]scode: diagnostics.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
threadIdx()
threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}.x
Test(i::Int32)
Test(4)
• %5 = +(::Test,::Test{4})::Any
↩
+(::Test{M}, ::Test{N}) where {M, N} @ Main ~/dev/MR/sme.jl:8
8 (+(::(Test{M})::Test, ::(Test{N})::Core.Const(Test{4}())) where {M,N})::Test = Test{(M + N::Core.Const(4))::Any}::Type{Test{_A}} where _A()
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [v]scode: inlay types, [V]scode: diagnostics.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
runtime M + N::Core.Const(4)
• Test{(M + N::Core.Const(4))::Any}::Type{Test{_A}} where _A()