GC incurs some overhead. It’d be nice if we don’t have to GC everything. In some cases, we can detect exactly when the memory needs to be freed at compile time. If so, would it be a good idea? Julia is a language that wants to have a cake and eat it too.
we already do this for everything except arrays, although the analysis isn’t interprocederal so it can miss some “obvious” cases
I don’t think we do?
We do SROA, i.e. scalar-replacement-of-aggregates, and the scalar replacement lives in registers and maybe spills to the stack.
Afaiu we never do stack-alloc, where we know that the lifetime is bounded by the stackframe and we allocate a fully formed object on the stack (please correct me if if I’m wrong or this changed!).
Afaiu we don’t do eager deterministic de-alloc, i.e. the compiler never generates a matching free
to the malloc
/ ijl_gc_pool_alloc
.
An example is the following:
julia> @noinline g(r) = (r[] += 1);
julia> f(i)=begin r = Ref(i); g(r); r[] end
In a perfect world, g
would have inferred effects that imply that r
doesn’t leak. But we requested the function @noinline
, so we cannot use scalar replacement.
So f
could place the Ref
on the stack instead of the heap; but it would still require an object header.
Or f
could allocate the Ref
on the heap, using jl_gc_pool_alloc
and then free it upon return from g
. But we don’t do that. (we are already paying the price of expensive allocations for non-compacting GC; might as well reap the benefits of opportunistic early free)
Related Feature request: unsafe_free!
See the example I gave here:
We have a pass llvm-alloc-opt
that moves object from the heap to the stack. When we do this we eliminate the object, this leads to some limitations.
Firstly we don’t allocate a tag anymore so this also requires us to be able to “see” and thus forward all calls to typeof
.
Furthermore the pass is function local (llvm) and does not take account IPO derived information. We have an IPO escape analysis but it has been challenging to wire up and add that IPO information to the LLVM pass.
One of the capabilities that we are missing is to allocate a full Julia object on the stack (including the tag) and have the GC understand this. This should be doable, but handling write barriers is a bit annoying.
We must guarantee that an alloca address is not escaped. If we have a write barrier and the parent is stack allocated we might accidentally push the address into the remset.
In order to handle that we would need to use one of the age/GC bits to represent “always young”/“ephemeral”.
So in order to level up escape analysis we need to improve the GC/runtime first.