Ref() which doesn't escape inside a function allocates on the heap

I recently learnt that creating a Ref inside a function in which it doesn’t escape will lead to a stack allocation instead of heap-allocation. But then when writing a cuda kernel like so -

function _kernel_0(trail, particles)
    idx = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    checkbounds(Bool, particles, idx) || return

    particle = Ref(@inbounds(particles[idx]))
    
    rand = gpuhash(idx + particle[].position[1] + particle[].position[2]) |> gpuhash_scale01
    motor(trail, particle, rand)
    sense(trail, particle, rand)

    particles[idx] = particle[]

    nothing
end

and looking at the code_llvm there’s this:

; ┌ @ refpointer.jl:134 within `Ref'
; │┌ @ refvalue.jl:10 within `RefValue' @ refvalue.jl:8
    %74 = bitcast {}*** %6 to i8*
    %75 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %74, i32 1424, i32 32) #4
    %76 = bitcast {}* %75 to i64*

It seems the compiler can’t infer that the Ref doesn’t escape _kernel_0’s stack, which is the case here. This is probably due to the motor and sense methods. Infact if I @inline those methods, there seems to be no heap allocation.

Now this kinda makes me feel weird as I always want to have some sort of control and I know there are ways around not using a Ref but I was wondering why is there no unsafe way to force a stack allocation, somewhat tell the compiler that a Ref doesn’t escape its calling function’s stack. Does this have any complications? This is especially important when writing cuda kernels.

from https://discourse.julialang.org/t/how-to-know-if-object-memory-resides-on-stack-or-heap/4927/8:

And there are cases that’s currently considered escaping but may not in the future,

  • used in function call to non-inlined function
  • stored to argument of non-inlined function

but I really don’t understand this limitation, and am surprised that “the future” isn’t now considering that was from 2017.