Pointer to stackframe in Julia, VLA/alloca-like; with e.g. StaticString.jl

I’m optimizing allocations away, for printing general strings (one of the last one is because of Libc.malloc(200), effectively in Julia, and it could be on the stack if Julia had such capability like in C).

It IS dangerous to get a pointer to the stack in general (if the stack-frame goes away), so I think this is simply not possible in Julia for good reasons, but is there a workaround? One would be translating some functions in Julia Base to C.

@mkitti, I’m now looking also into:
GitHub - mkitti/StaticStrings.jl: Fixed-length strings in Julia represented by NTuples (InlineStrings.jl is another option).

julia> cs = cstatic"Hello world!\n"
cstatic"Hello world!\n"13

julia> @time ccall(:printf, Cint, (Ptr{Cchar},), cs);
Hello world!
  0.000037 seconds (2 allocations: 48 bytes)

Still two, but curiously way worse (also much worse than just with Julia's default String):

julia> @time print(cs);
Hello world!
  0.000237 seconds (28 allocations: 752 bytes)

julia> @time ccall(:printf, Cint, (Ptr{CStaticString{13}},), Ref(cs));
Hello world!
  0.000043 seconds (2 allocations: 48 bytes)

julia> @time Ref(cs);
  0.000015 seconds (1 allocation: 32 bytes)

@edit Base.RefValue{CStaticString{13}}(cs)  # see first with @edit Ref(cs); I believe I tracked down to there:

mutable struct RefValue{T} <: Ref{T}
    x::T
    RefValue{T}() where {T} = new()
    RefValue{T}(x) where {T} = new(x)
end

I believe I tracked down to Julia constructing that RefValue, that latter new, because the string is on the stack and it needs a mutable struct, i.e. on the heap to point to.

Where is the other allocation coming from? I think it must be from :printf, and where is that exactly, in Libc? I haven’t tracked down yet, at least missed in it Julia if defined there.

With StaticCompiler.jl, if I recall, you have C-strings without allocations, but then you use Libc directly (maybe a good thing), and threads (nor libUV called) are not supported (a bad thing), also I don’t want the strings to be on the heap, or unidiomatic code (using Libc.malloc directly or indirectly).

I have seen people use

julia> @inline foo(n) = Core.Intrinsics.llvmcall(""" 
       %ptr = alloca i8, i32 %0, align 16
       %res = ptrtoint ptr %ptr to i64
       ret i64 %res
       """, UInt64, Tuple{Int32}, convert(Int32,n)))

This feels deeply wrong to me.

Otoh, it might be actually correct? Maybe ask somebody who is more knowledgeable about llvm?

Depends on the interaction between the mustinline attribute and object lifetimes. Prepare for language-lawyering.

I hate hate hate natural-language specifications without reference implementation, and I especially hate the culture that unclear borderline cases don’t get cleaned up by example (as part of the spec!).

1 Like

Pretty sure this won’t work as llvmcall creates a function, it doesn’t matter if it’s inlined, the alloca’s lifetime ends at the return. There is no way to do an alloca afaik.

2 Likes

You need to consider whether optimization passes on foo run before getting inlined into its caller (bad! The alloca goes away because it’s unused!) or after (yay! the alloca’s lifetime has been automatically extended until either the caller returns or the point of last use).

Afaict this pattern actually works. But I agree that it is an abomination, that it will 100% crash and burn in the interpreter, and that it is highly unclear whether this is supposed to work or rather an abuse of lazy implementation details in llvm and will go away at any point in time.

Note that julia does set the mustinline attribute for llvmcall.

I see Julia already doesn’t track all allocations (even thought it could for this):

julia> @allocated Libc.malloc(200)
0

So most likely does neither for ccall (if some C or e.g. Rust code allocates).

Would people actually want fully allocation free print/ln? I might be obsessing over this, the allocation/malloc below isn’t relatively that costly, because not GC-tracked, i.e. immediate free after. I just like seeing no allocations, it’s also an eyesore, distraction seeing them, if you’re trying to go to 0 for other stuff (and have some prints in maybe for debug).

Can you call a function (Julia’s? likely no?) or C? Neither likely) from such llvmcall? I.e. the following uv_req_set_data and well you would have to implement all of that loop in it too; or just in C.

I thought I was actually on my way to fix that last allocation with alloca, but this would be in addition to the 1-2 I still see.

I thought this function had a memory-leak, because the free is only on the error path, but the other free is in the caller, uv_write, maybe a bad code pattern…

julia> @time p=Ptr{Cchar}
  0.000002 seconds

julia> @time ccall(:printf, Cint, (p,), cs);
Hello world!
  0.000042 seconds (2 allocations: 48 bytes)

Where is that other allocation coming from (or well both)? Does ccall inherently allocate:

julia> @edit ccall(:printf, Cint, (p,), cs);  # You can't see since this isn't a function, a keyword, only with that illusion
ERROR: UndefVarError: `ccall` not defined

No, it does not work. The compiler keeps track of inlined functions and puts stack restores after each of them.

julia> @generated function alloca(::Type{T}, N, ::Val{A} = Val{sizeof(T)}()) where {A,T}
           ptyp = string('i', 8sizeof(Int))
           instrs = """
                          %ptr = alloca i8, i32 %0, align $A
                          %iptr = ptrtoint i8* %ptr to $ptyp
                          ret $ptyp %iptr
                      """
           quote
               $(Expr(:meta,:inline))
               Base.llvmcall($instrs, Ptr{$T}, Tuple{UInt32}, N%UInt32 * $(sizeof(T)%UInt32))
           end
       end;

julia> let 
           p1 = alloca(Int, 10)
           p2 = alloca(Int, 10)
           p3 = alloca(Int, 10)
           p1 === p2 === p3 # this is catastrophically bad! 
       end
true

Inlining is unfortunately not sufficient.

Here’s the LLVM showing what’s happening:

julia> code_llvm((Int,)) do n
           p1 = alloca(Int, n)
           p2 = alloca(Int, n)
           p1, p2
       end
; Function Signature: var"#23"(Int64)
;  @ REPL[20]:2 within `#23`
define void @"julia_#23_1968"(ptr noalias nocapture noundef nonnull sret([2 x i64]) align 8 dereferenceable(16) %sret_return, i64 signext %"n::Int64") #0 {
top:
; ┌ @ REPL[16]:1 within `alloca` @ REPL[16]:1
; │┌ @ REPL[16]:10 within `macro expansion`
; ││┌ @ int.jl:88 within `*`
     %0 = shl i64 %"n::Int64", 3
; ││└
    %savedstack = call ptr @llvm.stacksave()
    %1 = and i64 %0, 4294967288
    %ptr.i = alloca i8, i64 %1, align 8
    %iptr.i = ptrtoint ptr %ptr.i to i64
    call void @llvm.stackrestore(ptr %savedstack)
; └└
;  @ REPL[20]:3 within `#23`
; ┌ @ REPL[16]:1 within `alloca` @ REPL[16]:1
; │┌ @ REPL[16]:10 within `macro expansion`
    %ptr.i4 = alloca i8, i64 %1, align 8
    %iptr.i5 = ptrtoint ptr %ptr.i4 to i64
; └└
;  @ REPL[20]:4 within `#23`
  store i64 %iptr.i, ptr %sret_return, align 8
  %"new::Tuple.sroa.2.0.sret_return.sroa_idx" = getelementptr inbounds i8, ptr %sret_return, i64 8
  store i64 %iptr.i5, ptr %"new::Tuple.sroa.2.0.sret_return.sroa_idx", align 8
  ret void
}

Notice that before each alloca body, it does a @llvm.stacksave() and then after each it does a @llvm.stackrestore(ptr %savedstack). This makes it so that the stack behaves the exact same as if the function wasn’t inlined.

2 Likes