I’m encountering an error when running the following minimal example with check-bounds=no
. With bounds checking enabled (check-bounds=yes
), it works fine. But disabling it causes:
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Here is the MWE:
using CUDA
function test(A)
A[1] = rand()
return nothing
end
A = CUDA.zeros(1)
CUDA.@sync @cuda test(A)
Is this error caused by using rand() inside the kernel?
Is it fundamentally incorrect to use rand() inside a @cuda kernel?
If so, what is the proper way to generate random numbers on the GPU?
Any clarification would be appreciated.
versioninfo
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 384 × AMD EPYC 9654 96-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)
Environment:
LD_LIBRARY_PATH = /apps/t4/rhel9/free/cudnn/9.8.0/cuda/12/lib:/apps/t4/rhel9/free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib:/apps/t4/rhel9/free/openmpi/5.0.2/nvhpc/lib:/apps/t4/rhel9/free/ucx/1.16.0-gcc/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/comm_libs/nvshmem/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/comm_libs/nccl/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/math_libs/lib64:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/extras/qd/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/lib64
JULIA_DEPOT_PATH = /gs/fs/private/.julia
CUDA.versioninfo()
CUDA runtime 12.6, local installation
CUDA driver 12.9
NVIDIA driver 570.124.6
CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+570.124.6
Julia packages:
- CUDA: 5.8.2
- CUDA_Driver_jll: 0.13.0+0
- CUDA_Runtime_jll: 0.17.0+0
- CUDA_Runtime_Discovery: 0.3.5
Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6
Preferences:
- CUDA_Runtime_jll.version: 12.6
- CUDA_Runtime_jll.local: true
1 device:
0: NVIDIA H100 (sm_90, 92.489 GiB / 93.584 GiB available)