I’m using CUDA.jl to carry out some diffusion simulations. It’s a pretty standard setup, but makes fairly heavy use of rem
and mod
to bin particles to cells in a grid. To this end, we call map!
on a pair of F32 CuArrays, e.g. as:
N = 1000000
u = 1e-5 #or something
X = curand(Float32, N)
X1 = CUDA.zeros(Float32, N)
map!(x->CUDA.rem(x, u), X1, X)
We have to use this weird call as e.g. X1 .= rem.(X, u)
doesn’t actually produce the same result (the revenge of issue 748?)
Anyway, this does work as intended - but intermittently fails with an inexplicable out of-bounds error. How can map!
even go out of bounds? Why does this only happen sometimes?
I’ve verified this will happen on a GTX 1660 OC, and on a Tesla V100, with the same error.
ERROR: Out-of-bounds array access.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: LoadError: KernelException: exception thrown during kernel execution on device GeForce GTX 1660
Stacktrace:
[1] check_exceptions()
@ CUDA C:\Users\James\.julia\packages\CUDA\9T5Sq\src\compiler\exceptions.jl:37
[2] device_synchronize
@ C:\Users\James\.julia\packages\CUDA\9T5Sq\lib\cudadrv\context.jl:322 [inlined]
[3] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA C:\Users\James\.julia\packages\CUDA\9T5Sq\lib\cudadrv\module.jl:41
[4] CuModule
@ C:\Users\James\.julia\packages\CUDA\9T5Sq\lib\cudadrv\module.jl:23 [inlined]
[5] cufunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
@ CUDA C:\Users\James\.julia\packages\CUDA\9T5Sq\src\compiler\execution.jl:442
[6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler C:\Users\James\.julia\packages\GPUCompiler\fG3xK\src\cache.jl:94
[7] cufunction(f::GPUArrays.var"#map_kernel#18"{Int64}, tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, DiffusionSimulator.var"#43#49"{Float32}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\James\.julia\packages\CUDA\9T5Sq\src\compiler\execution.jl:288
[8] cufunction
@ C:\Users\James\.julia\packages\CUDA\9T5Sq\src\compiler\execution.jl:282 [inlined]
[9] macro expansion
@ C:\Users\James\.julia\packages\CUDA\9T5Sq\src\compiler\execution.jl:102 [inlined]
[10] #launch_heuristic#233
@ C:\Users\James\.julia\packages\CUDA\9T5Sq\src\gpuarrays.jl:17 [inlined]
[11] map!(f::Function, dest::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, xs::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ GPUArrays C:\Users\James\.julia\packages\GPUArrays\UBzTm\src\host\broadcast.jl:130
[12] diff_sim_gpu(I::Matrix{Int32}, seq::Seq, simu::Simu)
@ DiffusionSimulator c:\Users\James\.julia\dev\DiffusionSimulator\src\DiffusionSimulator.jl:189
[13] top-level scope
@ c:\Users\James\.julia\dev\DiffusionSimulator\test\runtests.jl:70