Passing tuple of structs to a CUDA.jl kernel

Hi,

I have a problem with passing a tuple of structs to a CUDA.jl kernel.
This is MWE:

using CUDA, Adapt, StaticArrays

struct MyType1 end
struct MyType2 end

struct MyStruct{T,N,L}
	type :: T
	data :: SMatrix{N,2,Float64,L}
end

Adapt.@adapt_structure MyStruct # this is not required probably
    
s1 = MyStruct(MyType1(), SMatrix{100,2}(rand(100,2)))
s2 = MyStruct(MyType2(), SMatrix{100,2}(rand(100,2)))
    
t = (s1,s2)
 
isbits(t) #true
    
function kernel(t)
    @cushow t[threadIdx().x].data[1,1]
    return nothing
end
    
@cuda threads=2 kernel(t)

but I am getting an error:

InvalidIRError: compiling kernel kernel(Tuple{MyStruct{MyType1, 100, 200}, MyStruct{MyType2, 100, 200}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_getfield)
Stacktrace:
 [1] getindex
   @ .\tuple.jl:30
 [2] macro expansion
   @ C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\device\intrinsics\output.jl:248
 [3] kernel
   @ .\In[19]:19
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(kernel), Tuple{Tuple{MyStruct{MyType1, 100, 200}, MyStruct{MyType2, 100, 200}}}}}, args::LLVM.Module)
    @ GPUCompiler C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\validation.jl:124
  [2] macro expansion
    @ C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\driver.jl:386 [inlined]
  [3] macro expansion
    @ C:\Users\lodygaw\.julia\packages\TimerOutputs\nDhDw\src\TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\driver.jl:384 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:332
  [7] #260
    @ C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:325 [inlined]
  [8] JuliaContext(f::CUDA.var"#260#261"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(kernel), Tuple{Tuple{MyStruct{MyType1, 100, 200}, MyStruct{MyType2, 100, 200}}}}}})
    @ GPUCompiler C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:324
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\lodygaw\.julia\packages\GPUCompiler\1FdJy\src\cache.jl:90
 [11] cufunction(f::typeof(kernel), tt::Type{Tuple{Tuple{MyStruct{MyType1, 100, 200}, MyStruct{MyType2, 100, 200}}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:297
 [12] cufunction(f::typeof(kernel), tt::Type{Tuple{Tuple{MyStruct{MyType1, 100, 200}, MyStruct{MyType2, 100, 200}}}})
    @ CUDA C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:291
 [13] top-level scope
    @ C:\Users\lodygaw\.julia\packages\CUDA\Y6ceW\src\compiler\execution.jl:102
 [14] eval
    @ .\boot.jl:373 [inlined]
 [15] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:1196

It seems quite basic to me but maybe it is not supported?

Why do you assume it’s not supported? There’s many reasons code may not compile, most typically caused by user error. Did you try the suggestion on how to debug that’s part of the error message?

EDIT: Upon actual inspection of your code (but still, next time please try the suggestion as recommended by the error…), there’s an inherent instability in your code: indexing t may result in two differently-typed versions of MyStruct, and that kind of instability is not supported on the GPU. You can simply spot the instability using code_warntype(getindex, Tuple{typeof(t), Int}).

2 Likes

I was well aware that it was my error, whether lack of knowledge or some mistake. I spotted the problem using code_typed as ErrorException("fatal error in type inference (type bound)"))::Union{} which is not the case on the CPU and that’s exactly why I assumed that “maybe it is not supported” due to something I am not aware of. Which, in this case, was the fact that such instabilities are not supported on GPU. Which was also not so obvious to me as defining t = CuArray{Union{typeof(s1),typeof(s2)}}([s1,s2]) instead of tuple works fine and it is facing the same instability I assume? I should probably present more detailed info about my debugging.
Anyway thank you for your help :slight_smile:

Type unstable code in general is not supported. In some cases, the compiler can do union-splitting and support some patterns statically, but even then there can be issues, like Manual union splitting past limits leads to incorrect codegen · Issue #1385 · JuliaGPU/CUDA.jl · GitHub.

That error itself should be fine though, and is just a safety check inserted by Julia making sure the results from inference are sound. It may even be removed soon.

1 Like

Oh, that’s very valuable knowledge, I will have to refactor my original code to avoid such containers.

1 Like