Are there any ways to create a static vector of variable length for each thread in a gpu kernel?
I tried this but it didn’t work.
function dosomething(n)
vec = @MVector zeros(n)
end
@cuda threads=5 dosomething(n)
In my application, I need to create few local small arrays (the size is determined by the argument of the kernel function call) for some intermediate calculation. Are there any alternative ways to do this?