Hi all,
I have some code that used to work perfectly fine with check-bounds=no
, but recently it started crashing with a bounds error. If I run it with check-bounds=yes
, it works without any issues.
I noticed that the stack trace mentions something like link
, which makes me think it might be related. I’m posting here to see if anyone has insight into what’s going on.
This isn’t the first time I’ve seen check-bounds=no
cause strange behavior — I ran into a different error in another package before. So now I’m wondering: is it just unsafe to use check-bounds=no
in general?
- I understand
check-bounds=no
is commonly used for performance in production builds, but has something changed recently in Julia or in some libraries that makes this more fragile? - Has anyone else experienced similar issues or found reliable ways to avoid them?
Any thoughts or suggestions would be greatly appreciated.
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 384 × AMD EPYC 9654 96-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)
julia> CUDA.versioninfo()
CUDA runtime 12.6, local installation
CUDA driver 12.9
NVIDIA driver 570.124.6
CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+570.124.6
Julia packages:
- CUDA: 5.8.1
- CUDA_Driver_jll: 0.13.0+0
- CUDA_Runtime_jll: 0.17.0+0
- CUDA_Runtime_Discovery: 0.3.5
Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6
Preferences:
- CUDA_Runtime_jll.version: 12.6
- CUDA_Runtime_jll.local: true
1 device:
0: NVIDIA H100 (sm_90, 92.164 GiB / 93.584 GiB available)
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/libcuda.jl:30
[2] nonblocking_synchronize(val::CUDA.CuContext)
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:172
[3] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:184
[4] device_synchronize
@ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/synchronization.jl:178 [inlined]
[5] checked_cuModuleLoadDataEx(_module::Base.RefValue{Ptr{CUDA.CUmod_st}}, image::Ptr{UInt8}, numOptions::Int64, options::Vector{CUDA.CUjit_option_enum}, optionValues::Vector{Ptr{Nothing}})
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:18
[6] CUDA.CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:60
[7] CuModule
@ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/lib/cudadrv/module.jl:49 [inlined]
[8] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/compilation.jl:409
[9] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:270
[10] cached_compilation(cache::Dict{Any, CUDA.CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler /gs/fs/****/****/.julia/packages/GPUCompiler/fi1Sa/src/execution.jl:159
[11] macro expansion
@ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:373 [inlined]
[12] macro expansion
@ ./lock.jl:273 [inlined]
[13] cufunction(f::var"#gpu_kern#20"{CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}, CUDA.CuDeviceVector{…}}, tt::Type{Tuple{…}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Nothing})
@ CUDA /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:368
[14] macro expansion
@ /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/compiler/execution.jl:112 [inlined]
[15] (::KernelAbstractions.Kernel{…})(args::Val{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
@ CUDA.CUDAKernels /gs/fs/****/****/.julia/packages/CUDA/LhtzZ/src/CUDAKernels.jl:122
[21] top-level scope
@ REPL[29]:1
Some type information was truncated. Use `show(err)` to see complete types.