`check-bounds=no` causes illegal memory access when using `rand()` in CUDA kernel

I’m encountering an error when running the following minimal example with check-bounds=no. With bounds checking enabled (check-bounds=yes), it works fine. But disabling it causes:

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)

Here is the MWE:

using CUDA

function test(A)
    A[1] = rand()
    return nothing
end

A = CUDA.zeros(1)
CUDA.@sync @cuda test(A)

Is this error caused by using rand() inside the kernel?
Is it fundamentally incorrect to use rand() inside a @cuda kernel?
If so, what is the proper way to generate random numbers on the GPU?
Any clarification would be appreciated.

versioninfo

Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)
Environment:
  LD_LIBRARY_PATH = /apps/t4/rhel9/free/cudnn/9.8.0/cuda/12/lib:/apps/t4/rhel9/free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib:/apps/t4/rhel9/free/openmpi/5.0.2/nvhpc/lib:/apps/t4/rhel9/free/ucx/1.16.0-gcc/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/comm_libs/nvshmem/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/comm_libs/nccl/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/math_libs/lib64:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/extras/qd/lib:/apps/t4/rhel9/isv/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/lib64
  JULIA_DEPOT_PATH = /gs/fs/private/.julia

CUDA.versioninfo()

CUDA runtime 12.6, local installation
CUDA driver 12.9
NVIDIA driver 570.124.6

CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+570.124.6

Julia packages:
- CUDA: 5.8.2
- CUDA_Driver_jll: 0.13.0+0
- CUDA_Runtime_jll: 0.17.0+0
- CUDA_Runtime_Discovery: 0.3.5

Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.version: 12.6
- CUDA_Runtime_jll.local: true

1 device:
  0: NVIDIA H100 (sm_90, 92.489 GiB / 93.584 GiB available)
2 Likes

This MWE also crashed when check-bounds=no

using CUDA
N = 10^5
A = CUDA.rand(N)
B = CUDA.zeros(Int, N)
sortperm!(B, A)
100000-element CuArray{Int64, 1, CUDA.DeviceMemory}:
Error showing value of type CuArray{Int64, 1, CUDA.DeviceMemory}:

SYSTEM (REPL): showing an error caused an error
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
2 Likes

Additional information:
These MWEs work properly in CUDA 5.7, but not in CUDA 5.8 — this appears to be a regression.

1 Like

Thanks for the MWE. I bisected and filed an issue: Illegal memory access after aligned_sizeof changes · Issue #2790 · JuliaGPU/CUDA.jl · GitHub

The issue is caused by CUDA.jl switching from sizeof to aligned_sizeof. If you want to help: It should be relatively easy to figure out which exact change caused the issue, and how that affects sortperm! here (since sizeof and aligned_sizeof of the Float32 eltype used here shouldn’t have any impact).

2 Likes