Title: Feedback Needed on [CUDA kernel crash very occasionally when MPI.jl is just loaded.] (GitHub Issue #2429)
Hello Julia community,
I have encountered an issue “CUDA.jl kernel crash very occasionally when MPI.jl is just loaded.” and have reported it on GitHub.
GitHub Issue Link: GitHub Issue #2429
Reproduce:
You may need to install MPI and run
julia --project -e 'using Pkg; Pkg.add(["CUDA", "MPIPreferences", "MPI"]); using MPIPreferences; MPIPreferences.use_system_binary()'
You can run this code to reproduce.
using CUDA
using MPI
MPI.versioninfo()
f(i) = i^2
a = CUDA.zeros(Float64, 128^3)
function test(a)
X = range(0, 1, length(a))
function _kern()
idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x - 1
stx = gridDim().x * blockDim().x
i = idx + 1
while i <= length(a)
a[i] = f(i) * X[i]
i += stx
end
return nothing
end
kern = @cuda launch = false _kern()
config = launch_configuration(kern.fun)
threads = config.threads
size = length(a)
blocks = cld.(size, threads)
CUDA.@sync kern(; threads, blocks)
end
for i in 1:1000000
@show i
test(a)
end
Purpose:
I am posting this here to get additional feedback and insights from the community. Any suggestions, potential workarounds, or similar experiences would be greatly appreciated.
Thank you for your help!
Best regards.