CUDA kernel crash very occasionally when MPI.jl is just loaded

Title: Feedback Needed on [CUDA kernel crash very occasionally when MPI.jl is just loaded.] (GitHub Issue #2429)

Hello Julia community,

I have encountered an issue “CUDA.jl kernel crash very occasionally when MPI.jl is just loaded.” and have reported it on GitHub.

GitHub Issue Link: GitHub Issue #2429

Reproduce:

You may need to install MPI and run

julia --project -e 'using Pkg; Pkg.add(["CUDA", "MPIPreferences", "MPI"]); using MPIPreferences; MPIPreferences.use_system_binary()'

You can run this code to reproduce.

using CUDA
using MPI
MPI.versioninfo()

f(i) = i^2
a = CUDA.zeros(Float64, 128^3)

function test(a)
    X = range(0, 1, length(a))

    function _kern()
        idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x - 1

        stx = gridDim().x * blockDim().x

        i = idx + 1
        while i <= length(a)
            a[i] = f(i) * X[i]
            i += stx
        end
        return nothing
    end

    kern = @cuda launch = false _kern()
    config = launch_configuration(kern.fun)
    threads = config.threads
    size = length(a)
    blocks = cld.(size, threads)
    CUDA.@sync kern(; threads, blocks)
end

for i in 1:1000000
    @show i
    test(a)
end

Purpose:

I am posting this here to get additional feedback and insights from the community. Any suggestions, potential workarounds, or similar experiences would be greatly appreciated.

Thank you for your help!

Best regards.

Please don’t triple-post. Most of us read Github and Discourse and it is not beneficial to bifurcate the discussions into three places.

Let’s use CUDA kernel very occasionally crash when MPI.jl (only when I use local binary.) is loaded. · Issue #846 · JuliaParallel/MPI.jl · GitHub as the primary touchpoint.

I apologize I did not know the manner. Thank you. I will close this topic.