I get “too many resources requested for launch” in CUDA.jl kernel when I try to either
set value to the array set in global memory like
mainWorkQueue[1,1]=1
OR print anything using
CUDA.@cuprint “aa”
I suspect that It is becouse the amount of registers use is too high so
- I suppose that in this case it may be related to caching reasults fetched from global memory - can i swith it off?
- can i run GC on variables in register memory - so I would manually clear it before problem arise (CUDA.unsafe_free!() seems to work on arrays only - am I wrong? )
- How can I increase maxrregcount in CUDA.jl ?
Please don’t double-post. Here’s my response from Slack:
can’t you just launch fewer threads? it’s recommended to use the occupancy API to have it automatically decide how many threads to launch
maxregcount is an argument to @cuda, but it’ll result in spilling, so your kernel will be slow
you need to try and reduce register pressure, e.g., by avoiding converting values between types, or throwing exceptions, etc. inspect the generated code for that
GC doesn’t apply at all here
as a last-resort solution you can try splitting the kernel
but you probably don’t need all that, just make sure your kernel can handle any thread count and use the occupancy API
1 Like