CUDA suddenly crashes with check-bounds=no, used to work fine

Hi all,

I have some code that used to work perfectly fine with check-bounds=no, but recently it started crashing with a bounds error. If I run it with check-bounds=yes, it works without any issues.

I noticed that the stack trace mentions something like link, which makes me think it might be related. I’m posting here to see if anyone has insight into what’s going on.

This isn’t the first time I’ve seen check-bounds=no cause strange behavior — I ran into a different error in another package before. So now I’m wondering: is it just unsafe to use check-bounds=no in general?

  • I understand check-bounds=no is commonly used for performance in production builds, but has something changed recently in Julia or in some libraries that makes this more fragile?
  • Has anyone else experienced similar issues or found reliable ways to avoid them?

Any thoughts or suggestions would be greatly appreciated.

julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)
julia> CUDA.versioninfo()
CUDA runtime 12.6, local installation
CUDA driver 12.9
NVIDIA driver 570.124.6

CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+570.124.6

Julia packages:
- CUDA: 5.8.1
- CUDA_Driver_jll: 0.13.0+0
- CUDA_Runtime_jll: 0.17.0+0
- CUDA_Runtime_Discovery: 0.3.5

Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.version: 12.6
- CUDA_Runtime_jll.local: true

1 device:
  0: NVIDIA H100 (sm_90, 92.164 GiB / 93.584 GiB available)
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/libcuda.jl:30
  [2] nonblocking_synchronize(val::CUDA.CuContext)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:172
  [3] device_synchronize(; blocking::Bool, spin::Bool)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:184
  [4] device_synchronize
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:178 [inlined]
  [5] checked_cuModuleLoadDataEx(_module::Base.RefValue{Ptr{CUDA.CUmod_st}}, image::Ptr{UInt8}, numOptions::Int64, options::Vector{CUDA.CUjit_option_enum}, optionValues::Vector{Ptr{Nothing}})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:18
  [6] CUDA.CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:60
  [7] CuModule
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:49 [inlined]
  [8] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/compilation.jl:409
  [9] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:270
 [10] cached_compilation(cache::Dict{Any, CUDA.CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:159
 [11] macro expansion
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:373 [inlined]
 [12] macro expansion
    @ ./lock.jl:273 [inlined]
 [13] cufunction(f::var"#gpu_kern#20"{CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}}, tt::Type{Tuple{…}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Nothing})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:368
 [14] macro expansion
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:112 [inlined]
 [15] (::KernelAbstractions.Kernel{…})(args::Val{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/CUDAKernels.jl:122

 [21] top-level scope
    @ REPL[29]:1
Some type information was truncated. Use `show(err)` to see complete types.