The issue here is that you’re trying to allocate memory, which is not allowed inside of kernels. Just using rand() works fine.
using CUDA
using CUDA: i32
function rand_kernel!(x)
i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
while i <= length(x)
x[i] = rand()
i += blockDim().x * gridDim().x
end
return
end
julia> x = CuArray{Float32}(undef, 1000);
julia> @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.76815176
0.33985865
0.6124729
0.4942757
julia> @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.79725224
0.9650727
0.32230678
0.43226108
If you want the equivalent of CUDA.rand(2) inside of the kernel, you can use StaticArrays.jl:
using StaticArrays
function rand_kernel_2!(y) # y is an n x 2 matrix
i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
while i <= size(y, 1)
y[i, :] .= rand(SVector{2, Float32})
i += blockDim().x * gridDim().x
end
return
end
julia> y = CuArray{Float32}(undef, 1000, 2);
julia> @cuda threads=128 blocks=8 rand_kernel_2!(y); y[1:4, :]
4×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
0.347162 0.344527
0.0153273 0.102394
0.195802 0.12623
0.0855174 0.838125
julia> @cuda threads=128 blocks=8 rand_kernel_2!(y); y[1:4, :]
4×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
0.354968 0.884167
0.304241 0.496749
0.872224 0.0735521
0.807519 0.20668
CUDA.seed! does not help here:
julia> CUDA.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.6673033
0.6596097
0.872357
0.2317881
julia> CUDA.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.75517017
0.7023138
0.2308588
0.790027
but you can use Random.seed! on the host
julia> Random.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.10930445
0.16218866
0.6950794
0.41934755
julia> Random.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.10930445
0.16218866
0.6950794
0.4193475
or inside the kernel
function rand_kernel_seed!(x)
Random.seed!(42)
i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
while i <= length(x)
x[i] = rand()
i += blockDim().x * gridDim().x
end
return
end
julia> @cuda threads=128 blocks=8 rand_kernel_seed!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.4078151
0.8522431
0.5209861
0.35377088
julia> @cuda threads=128 blocks=8 rand_kernel_seed!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.4078151
0.8522431
0.5209861
0.35377088
This is also described in the documentation.