Eager finalization and smart pointers

mkitti · January 3, 2023, 5:28pm

Smart pointers are pointers that free themselves when they are no longer needed. In Julia, we usually use the garbage collector for this purpose. However, we sometimes work with memory that is not managed by the garbage collector, particularly when using C foreign function interface. For example, consider unique_ptr from the C++ standard library.

Julia 1.9 implements the eager finalization suggestion by @jpsamaroo via #45272.

github.com/JuliaLang/julia

Eager finalizer insertion

JuliaLang:master ← JuliaLang:kf/eagerfinalizers

opened 07:13AM - 11 May 22 UTC

Keno

+382 -79

This is a variant of the eager-finalization idea (e.g. as seen in #44056), but …with a focus on the mechanism of finalizer insertion, since I need a similar pass downstream. Integration of EscapeAnalysis is left to #44056. My motivation for this change is somewhat different. In particular, I want to be able to insert finalize call such that I can subsequently SROA the mutable object. This requires a couple design points that are more stringent than the pass from #44056, so I decided to prototype them as an independent PR. The primary things I need here that are not seen in #44056 are: - The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis) - The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA) To this end, adding a finalizer is promoted to a builtin that is recognized by inference and inlining (such that inference can produce an inferred version of the finalizer for inlining). The current status is that this fixes the minimal example I wanted to have work, but does not yet extend to the motivating case I had. Nevertheless, I felt that this was a good checkpoint to synchronize with other efforts along these lines. Currently working demo: ```julia julia> const total_deallocations = Ref{Int}(0) Base.RefValue{Int64}(0) julia> mutable struct DoAlloc function DoAlloc() this = new() Core.finalizer(this) do this global total_deallocations[] += 1 end return this end end julia> function foo() for i = 1:1000 DoAlloc() end end foo (generic function with 1 method) julia> @code_llvm foo() ; @ REPL[3]:1 within `foo` define void @julia_foo_428() #0 { top: %.promoted = load i64, i64* inttoptr (i64 4373384000 to i64*), align 64 ; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000 ; @ REPL[3]:3 within `foo` ; ┌ @ REPL[2]:4 within `DoAlloc` ; │┌ @ REPL[2]:5 within `#3` ; ││┌ @ Base.jl within `setproperty!` store i64 %0, i64* inttoptr (i64 4373384000 to i64*), align 64 ; └└└ ; @ REPL[3]:4 within `foo` ret void } julia> foo() julia> total_deallocations[] 1000 ``` Thoughts @jpsamaroo @aviatesk ?

This suggests to me that we should now implement smart pointers. For example, we could implement UniquePointer as follows.

struct UniquePointer{T} <: Ref{T}
   deleter::Function
   ptr::Ptr{T}
   # The finalizer is attached to the RefValue since
   # UniquePointer is immutable
   deleted::Base.RefValue{Bool}
   # Accept a function first to support do syntax
   function UniquePointer(deleter, ptr)
      self = new{eltype(ptr)}(deleter, ptr, Ref(false))
      finalizer(self.deleted) do deleted
          deleted[] || self.deleter(ptr)
          deleted[] = true
      end
      return self
   end
   UniquePointer(ptr) = UniquePointer(Libc.free, ptr)
end
Base.unsafe_load(up::UniquePointer, args...) = unsafe_load(up.ptr, args...)
Base.unsafe_store!(up::UniquePointer, args...) = unsafe_store!(up.ptr, args...)
deleter(up::UniquePointer) = up.deleter
"""
   release(up::UniquePointer)

Release the `Ptr` from management and return the pointer.
"""
function release(up::UniquePointer)
   up.deleted[] = true
   return up.ptr
end

My understanding is that the deleter would be called soon after the unique pointer goes out of scope.

Questions:

Is my understanding of eager finalization correct?
Would this be valuable to have now for Julia 1.9 and beyond?

mkitti · January 3, 2023, 9:07pm

Digging deeper, it seems it may be too early to pursue this since there are significant restrictions in terms of what kind of finalizers can be called eagerly. Following @aviatesk 's demonstration, it is still quite impressive how well this works.

github.com/JuliaLang/julia

Comment by aviatesk to inlining: relax finalizer inlining control-flow restriction

JuliaLang:master ← JuliaLang:avi/inline-cfg-finalizer

Looks great, thanks sooo much for implementing the post-domination analysis, tha…t is very exciting itself. I think there seems to be a problem in finalizer insertion point though, MRE would be: ```julia const FINALIZATION_COUNT = Ref(0) init_finalization_count!() = FINALIZATION_COUNT[] = 0 get_finalization_count() = FINALIZATION_COUNT[] @noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x @noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...)) @test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,))) mutable struct DoAllocWithFieldInter x::Int end function register_finalizer!(obj::DoAllocWithFieldInter) finalizer(obj) do this add_finalization_count!(this.x) end end function cfg_finalization6(io) for i = -999:1000 o = DoAllocWithFieldInter(0) register_finalizer!(o) if i == 1000 o.x = i # with `setfield!` elseif i > 0 safeprint(io, o.x, '\n') end # <= shouldn't the finalizer be inlined here? end end let src = code_typed1(cfg_finalization6, (IO,)) @test count(isinvoke(:add_finalization_count!), src.code) == 1 end let init_finalization_count!() cfg_finalization6(IOBuffer()) @test get_finalization_count() == 1000 # this fails end ``` Currently the finalizer is inlined after the allocation site, so can't observe the field value changed by `setfield!`: ```julia julia> src = code_typed1(cfg_finalization6, (IO,)) CodeInfo( 1 ── goto #11 if not true 2 ┄─ %2 = φ (#1 => -999, #10 => %20)::Int64 │ %3 = φ (#1 => -999, #10 => %21)::Int64 │ %4 = %new(Main.DoAllocWithFieldInter, 0)::DoAllocWithFieldInter │ %5 = Base.getfield(%4, :x)::Int64 │ invoke Main.add_finalization_count!(%5::Int64)::Int64 │ %7 = (%2 === 1000)::Bool └─── goto #4 if not %7 3 ── Base.setfield!(%4, :x, %2)::Int64 └─── goto #6 4 ── %11 = Base.slt_int(0, %2)::Bool └─── goto #6 if not %11 5 ── %13 = Base.getfield(%4, :x)::Int64 └─── invoke Main.safeprint(io::IO, %13::Any, '\n'::Vararg{Any})::Any 6 ┄─ %15 = (%3 === 1000)::Bool └─── goto #8 if not %15 7 ── goto #9 8 ── %18 = Base.add_int(%3, 1)::Int64 └─── goto #9 9 ┄─ %20 = φ (#8 => %18)::Int64 │ %21 = φ (#8 => %18)::Int64 │ %22 = φ (#7 => true, #8 => false)::Bool │ %23 = Base.not_int(%22)::Bool └─── goto #11 if not %23 10 ─ goto #2 11 ┄ return nothing ) ```

using Test
include(normpath(Sys.BINDIR, "..", "share", "julia", "test", "compiler", "EscapeAnalysis", "setup.jl"))
const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
@test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,)))

mutable struct DoAllocWithFieldInter
    x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
    finalizer(obj) do this
        add_finalization_count!(this.x)
    end
end

function cfg_finalization6(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(0)
        register_finalizer!(o)
        if i == 1000
            o.x = i # with `setfield!`
        elseif i > 0
            safeprint(io, o.x, '\n')
        end
        # <= shouldn't the finalizer be inlined here?
    end
end
let src = code_typed1(cfg_finalization6, (IO,))
    @test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
    init_finalization_count!()
    cfg_finalization6(IOBuffer())
    @test get_finalization_count() == 1000 # this now succeeds!
end

quinnj · January 4, 2023, 12:33am

I wonder if there’s a way you could do a validation check or assert on the finalizer function to ensure it was valid for eager finalization? And if not, throw an argument error? I feel like that’s the only (maybe) missing feature for me w/ eager finalization is that I want to really make sure that it’s going to be eagerly finalized (since that affects the design quite a bit in certain cases). Does anyone know if there’s a way to do that kind of assertion w/ the compiler? Inspect the effects inferred on function w/ a given argument and assert the right things for eager finalization?

mkitti · January 4, 2023, 12:43am

I think this might work:

julia> effects = Base.infer_effects(x->nothing)
(+c,+e,+n,+t,+s,+m)

julia> Core.Compiler.is_nothrow(effects)
true

julia> Core.Compiler.is_notaskstate(effects)
true

We want Core.Compiler.is_finalizer_inlineable to be true, I think:

github.com

JuliaLang/julia/blob/52af407745f23512d48f489658bbe07cb21b5808/base/compiler/effects.jl#L169-L171


      
          is_finalizer_inlineable(effects::Effects) =
              is_nothrow(effects) &&
              is_notaskstate(effects)

mkitti · January 4, 2023, 1:01am

This simplified example looks promising.

julia> n::Int = 0
0

julia> const safe_free = Base.@assume_effects :nothrow :notaskstate x->(global n += 1;Libc.free(x.ptr))
#3 (generic function with 1 method)

julia> mutable struct SafePointer
           ptr::Ptr{Int}
       end

julia> function f()
           for i in 1:100
               s = SafePointer(Libc.malloc(sizeof(Int)))
               finalizer(safe_free, s)
           end
           nothing
       end
f (generic function with 1 method)

julia> n
0

julia> f()

julia> n
100

Here is the documentation on the effects:

help?> Core.Compiler.Effects
  effects::Effects

  Represents computational effects of a method call.

  The effects are a composition of different effect bits that represent some program property of the method being analyzed. They are represented as Bool or UInt8 bits with the following meanings:

...

    •  nothrow::Bool: this method is guaranteed to not throw an exception.

...

    •  notaskstate::Bool: this method does not access any state bound to the current task and may thus be moved to a different task without changing observable behavior. Note that this currently implies that noyield as well, since yielding modifies the state of the current task, though this may be split in the future.

mkitti · January 4, 2023, 5:20pm

I think there is a problem.

julia> Base.@assume_effects :nothrow :notaskstate inlinable_libc_free(r) = Libc.free(r[])
inlinable_libc_free (generic function with 1 method)

julia> function foo()
           r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
           finalizer(inlinable_libc_free, r)
           unsafe_store!(r[], 5)
           unsafe_load(r[])
       end
foo (generic function with 1 method)

julia> foo()
42156479

julia> function bar()
           r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
           finalizer(inlinable_libc_free, r)
           unsafe_store!(r[], 5)
           unsafe_load(r[]), r
       end
bar (generic function with 1 method)

julia> first(bar())
5

pxl-th · March 1, 2023, 12:54pm

I’m curious if there are any plans to relax restrictions on eager finalization?
As currently it seems too restrictive (basically even dict lookups are not allowed):

julia> Core.Compiler.infer_effects(get, (Dict{Int, Int}, Int, Int))
(!c,+e,!n,!t,+s,?m,+i)

For GPU arrays it’d be great to have it working as GC is not aware of other memory spaces and in a lot of scenarious we have to call it manually.
E.g. render loops in Nerf.jl, where each GC.gc(false) call takes ~1ms, where that loop may run for 100+ iterations for a single frame render.
It may also solve the need for: Free CuArrays in the reverse pass by mcabbott · Pull Request #1340 · FluxML/Zygote.jl · GitHub

From what I saw in practive, I think Nvidia driver is more robust and CUDA.jl with its alloc/retry mechanism works fine without calling GC (at least not as often).

But for AMDGPU it does not work reliably and can easily crash the runtime.
For example during scratch allocation at kernel dispatch which happens at the ROCr level and is not covered by alloc/retry mechanism.

Also, I though it inlines finalizer and runs it on the same task, but then there is :notaskstate requirement…
Does it mean that finalizer is not inlined and may still run on the separate task?

For example:

y = ROCArray{Float32}(...)
for i in ...
    x = AMDGPU.rand(...)
    y .+= x
    # <- insert finalizer for `x` here and run it on this task and allow throwing an exception, for example...
    # Inline: finalize(x) ≡ AMDGPU.unsafe_free!(x)
end
y

Mason · March 31, 2023, 1:38am

mkitti:

I think there is a problem.

julia> Base.@assume_effects :nothrow :notaskstate inlinable_libc_free(r) = Libc.free(r[])
inlinable_libc_free (generic function with 1 method)

julia> function foo()
           r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
           finalizer(inlinable_libc_free, r)
           unsafe_store!(r[], 5)
           unsafe_load(r[])
       end
foo (generic function with 1 method)

I think the problem here is that unsafe_load(r[]) gets linearized to

_1 = r[]
# <------- No more references to r, so the finalizer gets inserted here!
unsafe_load(_1)
# <------- You actually want the finalizer to be inserted here

AMJ · June 23, 2023, 6:38pm

I am kinda oblivious about matters discussed here and have been more confused by reading the ValeLang’s blog recently, can anything discussed from here used for helping GC or avoid/replace it?

mkitti · June 23, 2023, 6:46pm

I think you might want to start a new topic on this.

Essentially though the first way to avoid garbage collection is not create garbage to begin with. In any language, allocations will need to be deallocated. The question about garbage collection is only about when they will be deallocated. If you avoid allocations to begin by relying on statically allocated structures, then there are fewer issues.

Beyond that, the question is really about whether one can prove that deallocation can happen sooner rather than later and perhaps in a more predictable fashion.

AMJ · June 23, 2023, 6:57pm

The blog post used unique_prt a lot throughout the post, I thought maybe it would have been better to post on a semi related post

Most of the technicians for avoiding GC on Julia are actually pretty well defined already and people can find a good amount of info on it here. I was just curious what could be possible theoretically.

jar1 · June 23, 2023, 7:02pm

It’s also about spending runtime searching for things to deallocate, no? Without gc, you don’t need to spend time on that.

Topic		Replies	Views
Garbage collector and memory management New to Julia	2	994	February 12, 2019
Eager finalization insertion location question General Usage	1	192	September 3, 2023
Finalizer bug? Or am I dumb? General Usage question , potential-bug	3	615	February 24, 2023
Some questions about finalizers New to Julia	1	1818	May 2, 2021
Borrow checker with GC fallback? Internals & Design	4	697	March 19, 2024

Eager finalization and smart pointers

Related topics