Hi, I would like to take a slice from a CuArray in a kernel:
using CUDA
using StaticArrays
using LinearAlgebra
function run()
a::Float64=5.0
n=3
vectors = CUDA.rand(Float64, 100,n)
#vectors=CuArray([@SVector rand(n) for i in 1:100])
results = CUDA.ones(100,n)#CuArray{SVector{2,Float64},1}(undef, 100)
transform = Diagonal(@SVector ones(n)) * a
function linear_transform_kernel(vectors,::Val{N}) where {N}
i = threadIdx().x
results[i,:].= -vectors[i,:]
#results[i,:].= transform*vectors[i,:]
return
end
@sync @cuda threads=100 linear_transform_kernel(vectors, Val(n))
display(results)
end
run()
The slicing operation vectors[i,:]
fails even in this simple example:
ERROR: LoadError: InvalidIRError: compiling MethodInstance for (::var"#linear_transform_kernel#12"{CuDeviceMatrix{Float32, 1}})(::CuDeviceMatrix{Float64, 1}, ::Val{3}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to ijl_alloc_array_1d)
Stacktrace:
[1] Array
@ ./boot.jl:477
[2] Array
@ ./boot.jl:486
[3] Array
@ ./boot.jl:494
[4] similar
@ ./abstractarray.jl:877
[5] similar
@ ./abstractarray.jl:876
[6] similar
@ ./broadcast.jl:224
[7] similar
@ ./broadcast.jl:223
[8] copy
@ ./broadcast.jl:928
[9] materialize
@ ./broadcast.jl:903
[10] broadcast_preserving_zero_d
@ ./broadcast.jl:892
[11] -
@ ./abstractarraymath.jl:218
[12] linear_transform_kernel
@ /nfs/c3po/home/ge78muc/terra-dg-group-1/cuda_exploration.jl:19
...
Interestingly, results[i,:] .= ...
seems to work when vectors
is a CuArray
of SVectors
.
What are some ways to get around this? From my understanding, it should be a common problem to access a row of a matrix in a CUDA kernel. Do you use a CuArray of a staticarray then? But this is immutable, which is undesirable for my usecaseā¦