I recently learnt that creating a Ref
inside a function in which it doesn’t escape will lead to a stack allocation instead of heap-allocation. But then when writing a cuda kernel like so -
function _kernel_0(trail, particles)
idx = threadIdx().x + (blockIdx().x - 1) * blockDim().x
checkbounds(Bool, particles, idx) || return
particle = Ref(@inbounds(particles[idx]))
rand = gpuhash(idx + particle[].position[1] + particle[].position[2]) |> gpuhash_scale01
motor(trail, particle, rand)
sense(trail, particle, rand)
particles[idx] = particle[]
nothing
end
and looking at the code_llvm
there’s this:
; ┌ @ refpointer.jl:134 within `Ref'
; │┌ @ refvalue.jl:10 within `RefValue' @ refvalue.jl:8
%74 = bitcast {}*** %6 to i8*
%75 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %74, i32 1424, i32 32) #4
%76 = bitcast {}* %75 to i64*
It seems the compiler can’t infer that the Ref doesn’t escape _kernel_0
’s stack, which is the case here. This is probably due to the motor
and sense
methods. Infact if I @inline
those methods, there seems to be no heap allocation.
Now this kinda makes me feel weird as I always want to have some sort of control and I know there are ways around not using a Ref but I was wondering why is there no unsafe way to force a stack allocation, somewhat tell the compiler that a Ref doesn’t escape its calling function’s stack. Does this have any complications? This is especially important when writing cuda kernels.