Converting result of round or floor as Int in Metal

I’m trying to get into GPU programming and I’m trying to write a simple kernel that calculates the correlation function of an image. I’ve run into a problem where I cannot convert the result of the round or floor functions to an Int.

If I try to do the following as a very simple prototype

result = MtlArray(zeros(Int32, 10))

function test(result)
    i = thread_position_in_grid_1d()
    i1 = 1.f0
    i2 = 3.3f0
  
    result[i] = Int32(round(i1+i2))

    return
end

@metal threads = 10 groups = 1 test(result)

And run it, I get the following. Why is this not possible?

ERROR: InvalidIRError: compiling MethodInstance for test(::MtlDeviceVector{Int32, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:213
 [6] Int32
   @ ./float.jl:900
 [7] test
   @ ~/Library/Mobile Documents/com~apple~CloudDocs/Documents/PhD/JuliaProjects/InteractiveIsing.jl/Tests/MetalSampling.jl:53
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/validation.jl:145
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:407 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:406 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:120
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:94 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:90
 [10] compile
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:81 [inlined]
 [11] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:59
 [12] #59
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:55 [inlined]
 [13] LLVM.ThreadSafeContext(f::Metal.var"#59#60"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ LLVM ~/.julia/packages/LLVM/TLGyi/src/executionengine/ts_module.jl:14
 [14] JuliaContext(f::Metal.var"#59#60"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:35
 [15] compile
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:54 [inlined]
 [16] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/cache.jl:53
 [17] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/cache.jl:37
 [18] macro expansion
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:161 [inlined]
 [19] macro expansion
    @ ./lock.jl:267 [inlined]
 [20] mtlfunction(f::typeof(test), tt::Type{Tuple{MtlDeviceVector{Int32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:157
 [21] mtlfunction(f::typeof(test), tt::Type{Tuple{MtlDeviceVector{Int32, 1}}})
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:155
 [22] top-level scope
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:77

Does round(Int32, i1+i2) work?

1 Like

Wow, yes this works! Why does the other code error, however? I assumed there were no type instabilities in the code. I assume round(x) returns a ::typeof(x) and Int32 obviously returns an Int32.

Okay, I seem to really be misunderstanding something. Your suggestion works, but if I now try: round(Int32, sqrt(i1+i2))

I get:

ERROR: InvalidIRError: compiling MethodInstance for test(::MtlDeviceVector{Int32, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/BxfIW/src/runtime.jl:213
 [6] trunc
   @ ./float.jl:893
 [7] round
   @ ./float.jl:384
 [8] test
   @ ~/Library/Mobile Documents/com~apple~CloudDocs/Documents/PhD/JuliaProjects/InteractiveIsing.jl/Tests/MetalSampling.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/validation.jl:145
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:407 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:406 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:120
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:94 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:90
 [10] compile
    @ ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:81 [inlined]
 [11] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:59
 [12] #59
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:55 [inlined]
 [13] LLVM.ThreadSafeContext(f::Metal.var"#59#60"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ LLVM ~/.julia/packages/LLVM/TLGyi/src/executionengine/ts_module.jl:14
 [14] JuliaContext(f::Metal.var"#59#60"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/driver.jl:35
 [15] compile
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:54 [inlined]
 [16] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/cache.jl:53
 [17] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/BxfIW/src/cache.jl:37
 [18] macro expansion
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:161 [inlined]
 [19] macro expansion
    @ ./lock.jl:267 [inlined]
 [20] mtlfunction(f::typeof(test), tt::Type{Tuple{MtlDeviceVector{Int32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:157
 [21] mtlfunction(f::typeof(test), tt::Type{Tuple{MtlDeviceVector{Int32, 1}}})
    @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:155
 [22] top-level scope
    @ ~/.julia/packages/Metal/TtPHW/src/compiler/execution.jl:77

I think in all cases the problem is that even if the return value is correct, it is important the the function themselves do not use 64bit types. For example the sqrt is probably using a Float64 if you provide Integer as inputs…

(Maybe round( Float32( i1 + i2 ) ) works. But I’m also guessing.)

So you mean to say that there probably is some intermediate step in sqrt which uses Float64? I’m inputting two Float32’s into the sqrt though. Also, why does this problem not come up if I don’t use it as an input to the round function. If I just set result[I] = sqrt(i1+i2) it works fine.

Edit:

Of course if I do result[i] = round(Int32, somefloat), I insert a result = MtlArray(zeros(Int32,10)) into the test function (and for Floats I use the corresponding MtlArray(zeros(Float32, 10))). I made a typo before in my opening post which I fixed now.

Oh, maybe someone else here can help better. I tried to look into the source code of round and got confused. (I will look at it later since I want to know the solution as well :wink:

1 Like

Not a type instability, but a checked conversion: round still returns a Float32, so the conversion to Int32 may throw (which is currently unsupported by Metal.jl). The typical workaround is to use unsafe_trunc(Int32, ...) or ...%Int32.

2 Likes