Eager finalization and smart pointers

Smart pointers are pointers that free themselves when they are no longer needed. In Julia, we usually use the garbage collector for this purpose. However, we sometimes work with memory that is not managed by the garbage collector, particularly when using C foreign function interface. For example, consider unique_ptr from the C++ standard library.

Julia 1.9 implements the eager finalization suggestion by @jpsamaroo via #45272.

This suggests to me that we should now implement smart pointers. For example, we could implement UniquePointer as follows.

struct UniquePointer{T} <: Ref{T}
   # The finalizer is attached to the RefValue since
   # UniquePointer is immutable
   # Accept a function first to support do syntax
   function UniquePointer(deleter, ptr)
      self = new{eltype(ptr)}(deleter, ptr, Ref(false))
      finalizer(self.deleted) do deleted
          deleted[] || self.deleter(ptr)
          deleted[] = true
      return self
   UniquePointer(ptr) = UniquePointer(Libc.free, ptr)
Base.unsafe_load(up::UniquePointer, args...) = unsafe_load(up.ptr, args...)
Base.unsafe_store!(up::UniquePointer, args...) = unsafe_store!(up.ptr, args...)
deleter(up::UniquePointer) = up.deleter

Release the `Ptr` from management and return the pointer.
function release(up::UniquePointer)
   up.deleted[] = true
   return up.ptr

My understanding is that the deleter would be called soon after the unique pointer goes out of scope.


  1. Is my understanding of eager finalization correct?
  2. Would this be valuable to have now for Julia 1.9 and beyond?

Digging deeper, it seems it may be too early to pursue this since there are significant restrictions in terms of what kind of finalizers can be called eagerly. Following @aviatesk 's demonstration, it is still quite impressive how well this works.

using Test
include(normpath(Sys.BINDIR, "..", "share", "julia", "test", "compiler", "EscapeAnalysis", "setup.jl"))
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
@test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,)))

mutable struct DoAllocWithFieldInter
function register_finalizer!(obj::DoAllocWithFieldInter)
    finalizer(obj) do this

function cfg_finalization6(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(0)
        if i == 1000
            o.x = i # with `setfield!`
        elseif i > 0
            safeprint(io, o.x, '\n')
        # <= shouldn't the finalizer be inlined here?
let src = code_typed1(cfg_finalization6, (IO,))
    @test count(isinvoke(:add_finalization_count!), src.code) == 1
    @test get_finalization_count() == 1000 # this now succeeds!

I wonder if there’s a way you could do a validation check or assert on the finalizer function to ensure it was valid for eager finalization? And if not, throw an argument error? I feel like that’s the only (maybe) missing feature for me w/ eager finalization is that I want to really make sure that it’s going to be eagerly finalized (since that affects the design quite a bit in certain cases). Does anyone know if there’s a way to do that kind of assertion w/ the compiler? Inspect the effects inferred on function w/ a given argument and assert the right things for eager finalization?

I think this might work:

julia> effects = Base.infer_effects(x->nothing)

julia> Core.Compiler.is_nothrow(effects)

julia> Core.Compiler.is_notaskstate(effects)

We want Core.Compiler.is_finalizer_inlineable to be true, I think:

1 Like

This simplified example looks promising.

julia> n::Int = 0

julia> const safe_free = Base.@assume_effects :nothrow :notaskstate x->(global n += 1;Libc.free(x.ptr))
#3 (generic function with 1 method)

julia> mutable struct SafePointer

julia> function f()
           for i in 1:100
               s = SafePointer(Libc.malloc(sizeof(Int)))
               finalizer(safe_free, s)
f (generic function with 1 method)

julia> n

julia> f()

julia> n

Here is the documentation on the effects:

help?> Core.Compiler.Effects

  Represents computational effects of a method call.

  The effects are a composition of different effect bits that represent some program property of the method being analyzed. They are represented as Bool or UInt8 bits with the following meanings:


    β€’  nothrow::Bool: this method is guaranteed to not throw an exception.


    β€’  notaskstate::Bool: this method does not access any state bound to the current task and may thus be moved to a different task without changing observable behavior. Note that this currently implies that noyield as well, since yielding modifies the state of the current task, though this may be split in the future.

I think there is a problem.

julia> Base.@assume_effects :nothrow :notaskstate inlinable_libc_free(r) = Libc.free(r[])
inlinable_libc_free (generic function with 1 method)

julia> function foo()
           r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
           finalizer(inlinable_libc_free, r)
           unsafe_store!(r[], 5)
foo (generic function with 1 method)

julia> foo()

julia> function bar()
           r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
           finalizer(inlinable_libc_free, r)
           unsafe_store!(r[], 5)
           unsafe_load(r[]), r
bar (generic function with 1 method)

julia> first(bar())

I’m curious if there are any plans to relax restrictions on eager finalization?
As currently it seems too restrictive (basically even dict lookups are not allowed):

julia> Core.Compiler.infer_effects(get, (Dict{Int, Int}, Int, Int))

For GPU arrays it’d be great to have it working as GC is not aware of other memory spaces and in a lot of scenarious we have to call it manually.
E.g. render loops in Nerf.jl, where each GC.gc(false) call takes ~1ms, where that loop may run for 100+ iterations for a single frame render.
It may also solve the need for: Free CuArrays in the reverse pass by mcabbott Β· Pull Request #1340 Β· FluxML/Zygote.jl Β· GitHub

From what I saw in practive, I think Nvidia driver is more robust and CUDA.jl with its alloc/retry mechanism works fine without calling GC (at least not as often).

But for AMDGPU it does not work reliably and can easily crash the runtime.
For example during scratch allocation at kernel dispatch which happens at the ROCr level and is not covered by alloc/retry mechanism.

Also, I though it inlines finalizer and runs it on the same task, but then there is :notaskstate requirement…
Does it mean that finalizer is not inlined and may still run on the separate task?

For example:

y = ROCArray{Float32}(...)
for i in ...
    x = AMDGPU.rand(...)
    y .+= x
    # <- insert finalizer for `x` here and run it on this task and allow throwing an exception, for example...
    # Inline: finalize(x) ≑ AMDGPU.unsafe_free!(x)

I think the problem here is that unsafe_load(r[]) gets linearized to

_1 = r[]
# <------- No more references to r, so the finalizer gets inserted here!
# <------- You actually want the finalizer to be inserted here
1 Like

I am kinda oblivious about matters discussed here and have been more confused by reading the ValeLang’s blog recently, can anything discussed from here used for helping GC or avoid/replace it?

I think you might want to start a new topic on this.

Essentially though the first way to avoid garbage collection is not create garbage to begin with. In any language, allocations will need to be deallocated. The question about garbage collection is only about when they will be deallocated. If you avoid allocations to begin by relying on statically allocated structures, then there are fewer issues.

Beyond that, the question is really about whether one can prove that deallocation can happen sooner rather than later and perhaps in a more predictable fashion.

The blog post used unique_prt a lot throughout the post, I thought maybe it would have been better to post on a semi related post :smiley:

Most of the technicians for avoiding GC on Julia are actually pretty well defined already and people can find a good amount of info on it here. I was just curious what could be possible theoretically.

It’s also about spending runtime searching for things to deallocate, no? Without gc, you don’t need to spend time on that.

1 Like