Launching a Metal kernel from a thread

I’m trying to get into some basic GPU programming to run some analysis on data that would be unfeasible to run on the CPU.

I have written a GPU kernel to compute the correlation length function of some data. I call this kernel from a function that sets up all the necessary data, like the metal arrays, and then converts them to normal arrays, doing some additional calculations.

Running the function from the REPL works fine and I get good results. However, running the function on a thread, (i.e. by calling it with Threads.@spawn) causes an error upon calling the metal kernel, either a bus error, or the whole REPL will just hang.
The error I’m getting is:

[6982] signal (10.1): Bus error: 10
in expression starting at REPL[1]:1 
jl_gc_pool_alloc_inner at /Applications/Julia-1.9.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_init_root_task at /Applications/Julia-1.9.app/Contents/Resources/julia/lib/julia/libiulia-internal.1.9dvlib (unknown line) 
ijl_adopt_thread at /Applications/Julia-1.9.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.9.dylib (unknown line)
unknown function (ip: 0x30e8ac2ab)
MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
Allocations: 69125063 Pool: 69070725: Big: 54338): GC: 270

Naively I thought this would just work, but I don’t have a lot of knowledge on GPU programming. Can anyone help me understand why I’m getting this behavior?

There seems to be some interaction with the garbage collector. I have seemingly fixed the error by temporarily turning it off in the function as such:

function correlationLength(somestruct)
  GC.enable(false)
  # Setup data
  ...

  # Launch Kernel
  Metal.@sync @metal corrkernel(...)
  
  # Some sequential operations
  ...

  GC.enable(true)
  return data

end

Is this a bug or expected?

This is a bug, see Crash during MTLDispatchListApply · Issue #225 · JuliaGPU/Metal.jl · GitHub. I thought I fixed this on 1.9.2, see Command buffer callbacks can cause bus error during thread adoption · Issue #138 · JuliaGPU/Metal.jl · GitHub, but apparently there’s something else going on.

Which version of Julia 1.9 are you using?
Also, if you have a good reproducer, please add it to the above issue.

Interesting. I was on 1.9.1. Updating to 1.9.2 I don’t seem to get the bus error anymore, but now the REPL will always hang. Turning off the garbage collector for the function still seems to solve this.