Hi! I’d like to modify an MVector
in the context of CUDA Dynamic Parallelism. Here goes a minimal example:
using CUDA, StaticArrays
function outer!()
v = MVector{1, Float32}(undef)
@cuda dynamic = true inner!(v)
nothing
end
function inner!(v)
v[1] = 0.0f0
nothing
end
@cuda outer!()
I receive the following message:
ERROR: Failed to compile PTX code (ptxas exited with code 255)
Invocation arguments: --generate-line-info --compile-only --verbose --gpu-name sm_70 --output-file /tmp/jl_WVUB2l0xml.cubin /tmp/jl_m9gxUniPUZ.ptx
ptxas /tmp/jl_m9gxUniPUZ.ptx, line 43; error : Parameter to entry function cannot be an incomplete array.
ptxas /tmp/jl_m9gxUniPUZ.ptx, line 298; error : Parameter to entry function cannot be an incomplete array.
ptxas fatal : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_m9gxUniPUZ.ptx
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compile(job::GPUCompiler.CompilerJob)
@ CUDA /.julia/packages/CUDA/htRwP/src/compiler/compilation.jl:356
[3] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler /.julia/packages/GPUCompiler/U36Ed/src/execution.jl:125
[4] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler /.julia/packages/GPUCompiler/U36Ed/src/execution.jl:103
[5] macro expansion
@ /.julia/packages/CUDA/htRwP/src/compiler/execution.jl:367 [inlined]
[6] macro expansion
@ ./lock.jl:267 [inlined]
[7] cufunction(f::typeof(outer!), tt::Type{Tuple{}}; kwargs::@Kwargs{})
@ CUDA /.julia/packages/CUDA/htRwP/src/compiler/execution.jl:362
[8] cufunction(f::typeof(outer!), tt::Type{Tuple{}})
@ CUDA /.julia/packages/CUDA/htRwP/src/compiler/execution.jl:359
[9] top-level scope
@ /.julia/packages/CUDA/htRwP/src/compiler/execution.jl:112
[10] top-level scope
@ /.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.
Is there a way to do this ?