CUDA suddenly crashes with check-bounds=no, used to work fine

0samuraiE · May 22, 2025, 12:09pm

Hi all,

I have some code that used to work perfectly fine with check-bounds=no, but recently it started crashing with a bounds error. If I run it with check-bounds=yes, it works without any issues.

I noticed that the stack trace mentions something like link, which makes me think it might be related. I’m posting here to see if anyone has insight into what’s going on.

This isn’t the first time I’ve seen check-bounds=no cause strange behavior — I ran into a different error in another package before. So now I’m wondering: is it just unsafe to use check-bounds=no in general?

I understand check-bounds=no is commonly used for performance in production builds, but has something changed recently in Julia or in some libraries that makes this more fragile?
Has anyone else experienced similar issues or found reliable ways to avoid them?

Any thoughts or suggestions would be greatly appreciated.

julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, local installation
CUDA driver 12.9
NVIDIA driver 570.124.6

CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+570.124.6

Julia packages:
- CUDA: 5.8.1
- CUDA_Driver_jll: 0.13.0+0
- CUDA_Runtime_jll: 0.17.0+0
- CUDA_Runtime_Discovery: 0.3.5

Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.version: 12.6
- CUDA_Runtime_jll.local: true

1 device:
  0: NVIDIA H100 (sm_90, 92.164 GiB / 93.584 GiB available)

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/libcuda.jl:30
  [2] nonblocking_synchronize(val::CUDA.CuContext)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:172
  [3] device_synchronize(; blocking::Bool, spin::Bool)
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:184
  [4] device_synchronize
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:178 [inlined]
  [5] checked_cuModuleLoadDataEx(_module::Base.RefValue{Ptr{CUDA.CUmod_st}}, image::Ptr{UInt8}, numOptions::Int64, options::Vector{CUDA.CUjit_option_enum}, optionValues::Vector{Ptr{Nothing}})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:18
  [6] CUDA.CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:60
  [7] CuModule
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:49 [inlined]
  [8] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/compilation.jl:409
  [9] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:270
 [10] cached_compilation(cache::Dict{Any, CUDA.CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:159
 [11] macro expansion
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:373 [inlined]
 [12] macro expansion
    @ ./lock.jl:273 [inlined]
 [13] cufunction(f::var"#gpu_kern#20"{CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}}, tt::Type{Tuple{…}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Nothing})
    @ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:368
 [14] macro expansion
    @ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:112 [inlined]
 [15] (::KernelAbstractions.Kernel{…})(args::Val{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/CUDAKernels.jl:122

 [21] top-level scope
    @ REPL[29]:1
Some type information was truncated. Use `show(err)` to see complete types.

0samuraiE · May 30, 2025, 9:43am

I created a new post, partly because I didn’t have time to debug, but also because it was too unclear.

Topic		Replies	Views
Package test failing due to bounds checking enabled General Usage question , testing , simd	13	1440	March 19, 2017
Peculiar behavior with Pkg.test General Usage	3	328	September 22, 2021
Removing bound checks globally in an interactive session General Usage	2	768	November 12, 2017
Cuda-memcheck reports over 1300 errors with 4 lines of julia code with CUDA.jl GPU question , cudanative , cudajl	2	727	July 20, 2022
Generalizing bounds checking General Usage question	7	1247	July 8, 2017

CUDA suddenly crashes with check-bounds=no, used to work fine

Related topics