Metal throws ERROR: Exception handler triggered on unmanaged thread

This is my very first try to use the M2 GPU using the 1.9-rc2 aa64. The following implements a straight forward imperative bitonic sort. However, when running from the REPL, from time to time, it throws

ERROR: Exception handler triggered on unmanaged thread.
[56040] signal (10.1): Bus error: 10
in expression starting at REPL[4]:1
unknown function (ip: 0x2bbe70208)
MTLDispatchListApply at /System/Library/Frameworks/Metal.framework/Versions/A/Metal (unknown line)
Allocations: 27896731 (Pool: 27862763; Big: 33968); GC: 44

But so far I did not see the error when running it from the terminal, julia1.9 --project=. mbitonic.jl

Any hints or help appreciated :wink:

using Random, ThreadsX, Metal

function vxchange!(a, jj, kk)
  i = thread_position_in_grid_1d()-1

  ij = i ⊻ jj
  if ij > i
    if (i & kk) == 0 && a[i + 1] > a[ij + 1]
      a[i + 1], a[ij + 1] = a[ij + 1], a[i + 1]
    end
    if (i & kk) != 0 && a[i + 1] < a[ij + 1]
      a[i + 1], a[ij + 1] = a[ij + 1], a[i + 1]
    end
  end
  nothing
end
 
function pibitonic!(a)
  n = length(a)
  q = Int(log2(n))
  for k = 1:q
    kk = 1 << k
    for j = k-1:-1:0
      jj = 1 << j
      @metal threads=2^10 groups=n ÷ 2^10 vxchange!(a, jj, kk)
    end
  end
  nothing
end
 
function test(n)
  a = rand(Float32,n)
  b = MtlArray(copy(a))

  @time ThreadsX.sort!(a)
  @time pibitonic!(b)

  @assert Array(b) == a
end

for i=1:5
  test(2^24)
end

Known issue on 1.9, Sporadic `ERROR: Exception handler triggered on unmanaged thread.` · Issue #138 · JuliaGPU/Metal.jl · GitHub. Best to use 1.8 for the time being.

Interestingly, ThreadsX.sort! is substantially slower with 1.8.5 intel compared to 1.9-rc2 native.
@tkf might be interested in this observation.