Ref() which doesn't escape inside a function allocates on the heap

I recently learnt that creating a Ref inside a function in which it doesn’t escape will lead to a stack allocation instead of heap-allocation. But then when writing a cuda kernel like so -

function _kernel_0(trail, particles)
    idx = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    checkbounds(Bool, particles, idx) || return

    particle = Ref(@inbounds(particles[idx]))
    rand = gpuhash(idx + particle[].position[1] + particle[].position[2]) |> gpuhash_scale01
    motor(trail, particle, rand)
    sense(trail, particle, rand)

    particles[idx] = particle[]


and looking at the code_llvm there’s this:

; ┌ @ refpointer.jl:134 within `Ref'
; │┌ @ refvalue.jl:10 within `RefValue' @ refvalue.jl:8
    %74 = bitcast {}*** %6 to i8*
    %75 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %74, i32 1424, i32 32) #4
    %76 = bitcast {}* %75 to i64*

It seems the compiler can’t infer that the Ref doesn’t escape _kernel_0's stack, which is the case here. This is probably due to the motor and sense methods. Infact if I @inline those methods, there seems to be no heap allocation.

Now this kinda makes me feel weird as I always want to have some sort of control and I know there are ways around not using a Ref but I was wondering why is there no unsafe way to force a stack allocation, somewhat tell the compiler that a Ref doesn’t escape its calling function’s stack. Does this have any complications? This is especially important when writing cuda kernels.


And there are cases that’s currently considered escaping but may not in the future,

  • used in function call to non-inlined function
  • stored to argument of non-inlined function

but I really don’t understand this limitation, and am surprised that “the future” isn’t now considering that was from 2017.