Metal.jl weird behavior above 2^27

Hi! I am new to GPU programming, and I have been experimenting with Metal.jl

As practice, I wrote this little program to estimate pi using the Monte Carlo method.

using Metal
using Random

throw_dart(a) = (2*a - 1)^2
dart_hit_circle(a) = a <= 1

function est_pi_gpu(N)
    darts = Metal.rand(N, 2)
    darts = mapreduce(throw_dart, +, darts; dims=2)
    hits = mapreduce(dart_hit_circle, +, darts)

    return 4 * hits / N
end

function est_pi_cpu(N)
    darts = Random.rand(N, 2)
    darts = mapreduce(throw_dart, +, darts; dims=2)
    hits = mapreduce(dart_hit_circle, +, darts)

    return 4 * hits / N
end

dart_throw_count = 2^27

println("GPU Attempt:")
@time println(est_pi_gpu(dart_throw_count))

println("\nCPU Attempt:")
@time println(est_pi_cpu(dart_throw_count))

If I set dart_throw_count to anything below 2^27, this works fine. In the first run I set dart_throw_count = 2^27 - 1 and it returns a proper pi estimate. But as soon as I set dart_throw_count = 2^27 or higher (The second run) it just returns 4 for the GPU

So only for the GPU the line hits = mapreduce(dart_hit_circle, +, darts) returns the length of the array when the length is >=2^27

Does anyone have any ideas why this is happening?

perhaps Different result for personal argmax on CPU and GPU if array size is large enough · Issue #476 · JuliaGPU/Metal.jl · GitHub