Smart pointers are pointers that free themselves when they are no longer needed. In Julia, we usually use the garbage collector for this purpose. However, we sometimes work with memory that is not managed by the garbage collector, particularly when using C foreign function interface. For example, consider unique_ptr from the C++ standard library.
This suggests to me that we should now implement smart pointers. For example, we could implement UniquePointer as follows.
struct UniquePointer{T} <: Ref{T}
deleter::Function
ptr::Ptr{T}
# The finalizer is attached to the RefValue since
# UniquePointer is immutable
deleted::Base.RefValue{Bool}
# Accept a function first to support do syntax
function UniquePointer(deleter, ptr)
self = new{eltype(ptr)}(deleter, ptr, Ref(false))
finalizer(self.deleted) do deleted
deleted[] || self.deleter(ptr)
deleted[] = true
end
return self
end
UniquePointer(ptr) = UniquePointer(Libc.free, ptr)
end
Base.unsafe_load(up::UniquePointer, args...) = unsafe_load(up.ptr, args...)
Base.unsafe_store!(up::UniquePointer, args...) = unsafe_store!(up.ptr, args...)
deleter(up::UniquePointer) = up.deleter
"""
release(up::UniquePointer)
Release the `Ptr` from management and return the pointer.
"""
function release(up::UniquePointer)
up.deleted[] = true
return up.ptr
end
My understanding is that the deleter would be called soon after the unique pointer goes out of scope.
Questions:
Is my understanding of eager finalization correct?
Would this be valuable to have now for Julia 1.9 and beyond?
Digging deeper, it seems it may be too early to pursue this since there are significant restrictions in terms of what kind of finalizers can be called eagerly. Following @aviatesk 's demonstration, it is still quite impressive how well this works.
using Test
include(normpath(Sys.BINDIR, "..", "share", "julia", "test", "compiler", "EscapeAnalysis", "setup.jl"))
const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
@test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,)))
mutable struct DoAllocWithFieldInter
x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
finalizer(obj) do this
add_finalization_count!(this.x)
end
end
function cfg_finalization6(io)
for i = -999:1000
o = DoAllocWithFieldInter(0)
register_finalizer!(o)
if i == 1000
o.x = i # with `setfield!`
elseif i > 0
safeprint(io, o.x, '\n')
end
# <= shouldn't the finalizer be inlined here?
end
end
let src = code_typed1(cfg_finalization6, (IO,))
@test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
init_finalization_count!()
cfg_finalization6(IOBuffer())
@test get_finalization_count() == 1000 # this now succeeds!
end
I wonder if thereβs a way you could do a validation check or assert on the finalizer function to ensure it was valid for eager finalization? And if not, throw an argument error? I feel like thatβs the only (maybe) missing feature for me w/ eager finalization is that I want to really make sure that itβs going to be eagerly finalized (since that affects the design quite a bit in certain cases). Does anyone know if thereβs a way to do that kind of assertion w/ the compiler? Inspect the effects inferred on function w/ a given argument and assert the right things for eager finalization?
julia> n::Int = 0
0
julia> const safe_free = Base.@assume_effects :nothrow :notaskstate x->(global n += 1;Libc.free(x.ptr))
#3 (generic function with 1 method)
julia> mutable struct SafePointer
ptr::Ptr{Int}
end
julia> function f()
for i in 1:100
s = SafePointer(Libc.malloc(sizeof(Int)))
finalizer(safe_free, s)
end
nothing
end
f (generic function with 1 method)
julia> n
0
julia> f()
julia> n
100
Here is the documentation on the effects:
help?> Core.Compiler.Effects
effects::Effects
Represents computational effects of a method call.
The effects are a composition of different effect bits that represent some program property of the method being analyzed. They are represented as Bool or UInt8 bits with the following meanings:
...
β’ nothrow::Bool: this method is guaranteed to not throw an exception.
...
β’ notaskstate::Bool: this method does not access any state bound to the current task and may thus be moved to a different task without changing observable behavior. Note that this currently implies that noyield as well, since yielding modifies the state of the current task, though this may be split in the future.
julia> Base.@assume_effects :nothrow :notaskstate inlinable_libc_free(r) = Libc.free(r[])
inlinable_libc_free (generic function with 1 method)
julia> function foo()
r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
finalizer(inlinable_libc_free, r)
unsafe_store!(r[], 5)
unsafe_load(r[])
end
foo (generic function with 1 method)
julia> foo()
42156479
julia> function bar()
r = Ref(Ptr{Int}(Libc.malloc(sizeof(Int))))
finalizer(inlinable_libc_free, r)
unsafe_store!(r[], 5)
unsafe_load(r[]), r
end
bar (generic function with 1 method)
julia> first(bar())
5
Iβm curious if there are any plans to relax restrictions on eager finalization?
As currently it seems too restrictive (basically even dict lookups are not allowed):
For GPU arrays itβd be great to have it working as GC is not aware of other memory spaces and in a lot of scenarious we have to call it manually.
E.g. render loops in Nerf.jl, where each GC.gc(false) call takes ~1ms, where that loop may run for 100+ iterations for a single frame render.
It may also solve the need for: Free CuArrays in the reverse pass by mcabbott Β· Pull Request #1340 Β· FluxML/Zygote.jl Β· GitHub
From what I saw in practive, I think Nvidia driver is more robust and CUDA.jl with its alloc/retry mechanism works fine without calling GC (at least not as often).
But for AMDGPU it does not work reliably and can easily crash the runtime.
For example during scratch allocation at kernel dispatch which happens at the ROCr level and is not covered by alloc/retry mechanism.
Also, I though it inlines finalizer and runs it on the same task, but then there is :notaskstate requirementβ¦
Does it mean that finalizer is not inlined and may still run on the separate task?
For example:
y = ROCArray{Float32}(...)
for i in ...
x = AMDGPU.rand(...)
y .+= x
# <- insert finalizer for `x` here and run it on this task and allow throwing an exception, for example...
# Inline: finalize(x) β‘ AMDGPU.unsafe_free!(x)
end
y
I think the problem here is that unsafe_load(r[]) gets linearized to
_1 = r[]
# <------- No more references to r, so the finalizer gets inserted here!
unsafe_load(_1)
# <------- You actually want the finalizer to be inserted here
I am kinda oblivious about matters discussed here and have been more confused by reading the ValeLangβs blog recently, can anything discussed from here used for helping GC or avoid/replace it?
I think you might want to start a new topic on this.
Essentially though the first way to avoid garbage collection is not create garbage to begin with. In any language, allocations will need to be deallocated. The question about garbage collection is only about when they will be deallocated. If you avoid allocations to begin by relying on statically allocated structures, then there are fewer issues.
Beyond that, the question is really about whether one can prove that deallocation can happen sooner rather than later and perhaps in a more predictable fashion.
The blog post used unique_prt a lot throughout the post, I thought maybe it would have been better to post on a semi related post
Most of the technicians for avoiding GC on Julia are actually pretty well defined already and people can find a good amount of info on it here. I was just curious what could be possible theoretically.