Best way to have GC manage freeing C-allocated storage

What is the best (or a good) way of solving the following very generic problem? I’d like to

  • call obj_alloc() and obj_free() in some C library from Julia. On the Julia side, work with Ptr{obj}. It can be considered opaque, it is only passed to other functions in the same library.
  • Have Julia GC handle calling obj_free for me, probably with a wrapper.
  • Have the simplest interface to the wrapped object possible. The user shouldn’t have to know where it comes from.
  • When using the obj, have as small a performance degradation as possible compared to using Ptr{obj} directly.

So, in particular, things like wrapping the alloc in a try block and freeing in finally, is not what I’m looking for.

For example, here is an attempt for random number generators in GSL (gnu scientific library)

using GSL: GSL

struct RNG
    x::Ptr{GSL.gsl_rng}
    _y::Ref
    function RNG(rng_type=GSL.gsl_rng_taus2)
        _rng = GSL.rng_alloc(rng_type)
        mrng = Ref(_rng)
        rng = new(_rng, mrng)
        finalizer(x -> GSL.rng_free(rng.x), mrng)
        return rng
    end
end

(rng::RNG)() = rng.x

Comments

  • This apparently uses more memory than just using Ptr.
  • In some tests generating uniform random samples, I see no performance degradation.
  • If I use a mutable struct as a wrapper, instead of struct, there is a loss of performance.
  • I use Ref above for convenience, but a mutable struct wrapper also works for the second field.

It seems like there must be a simpler way, but I don’t see it.
Here is a more generic version

struct AllocFree{T, AF, FF}
    x::Ptr{T}
    _y::Ref
    function AllocFree{T,AF,FF}(args...) where {T, AF, FF}
        _obj = AF.instance(args...)
        mobj = Ref(_obj)
        obj = new{T,AF,FF}(_obj, mobj)
        finalizer(x -> FF.instance(obj.x), mobj)
        return obj
    end
end

(obj::AllocFree)() = obj.x

Then

julia> using GSL;

julia> mkrng(t=GSL.gsl_rng_taus2) = AllocFree{GSL.gsl_rng, typeof(GSL.rng_alloc), typeof(GSL.rng_free)}(t);

julia> rng = mkrng()
AllocFree{gsl_rng, typeof(rng_alloc), typeof(rng_free)}(Ptr{gsl_rng} @0x0000000002677af0, Base.RefValue{Ptr{gsl_rng}}(Ptr{gsl_rng} @0x0000000002677af0))

julia> GSL.ran_flat(rng(), 0.0, 1.0)
0.18691460322588682
1 Like

Finalizers can be problematic if it is an issue for the free to be called from a different thread.

If possible, I would encourage the use of the do syntax that implements the try - finally method you are avoiding. This hides the free logic away from the user while also making the free occur in deterministic time.

function use_obj(f::Function, args...)
   obj = obj_alloc(args...)
   try
       return f(obj)
   finally
       obj_free(obj)
   end
end

use_obj(args...) do obj
   # use object here.
end

Now to address your example, note that Ref is an abstract type.

julia> isabstracttype(Ref)
true

julia> r = Ref(5)
Base.RefValue{Int64}(5)

julia> typeof(r)
Base.RefValue{Int64}

julia> isabstracttype(typeof(r))
false

julia> isconcretetype(typeof(r))
true

I recommend making the field concrete.

struct RNG
    x::Ptr{GSL.gsl_rng}
    _y::Base.RefValue{Ptr{GSL.gsl_rng}}
    ...
end

struct AllocFree{T, AF, FF}
    x::Ptr{T}
    _y::Base.RefValue{Ptr{T}}
...
end

What is the purpose of storing the same pointer in two different fields, versus having one field x::Ref{Ptr{T}} or having a mutable struct with a field x::Ptr{T}?

Can you give an example of a benchmark that experiences this loss? Because it’s possible it could be fixed by making the struct a concrete type.

When you’re accessing objects like this in Julia, I think it is a little more idiomatic to follow the convention of Ref and override obj[] instead of obj(). I.e., you’re getting the “only” index present in obj, rather than calling it as a function.

Note: I’ve only ever used mutable struct Obj{T}; ptr :: Ptr{T}; end when wrapping my own stuff, and just never ran into any problems I could notice, so I’m curious about this.

He is potentially avoiding one level of indirection with the first field x. The memory layout of the struct is just two pointers. A mutable struct wrapping a pointer is a really a pointer to a pointer.

The second field mainly is needed to create a mutable object that Julia will try to garbage collect. Technically we could simplify by just making _y a reference to nothing while capturing the pointer within the anonymous function.

struct RNG
    x::Ptr{GSL.gsl_rng}
    _y::Base.RefValue{Nothing}
    function RNG(rng_type=GSL.gsl_rng_taus2)
        _rng = GSL.rng_alloc(rng_type)
        rng = new(_rng, Ref(nothing))
        finalizer(_ -> GSL.rng_free(_rng), rng._y)
        return rng
    end
end

I didn’t realize that this is a problem, or even think about. But, I’m not surprised. I’ll have to learn more about it. I mean is it always a problem? If sometimes, then when?

Regarding Ref. I know it’s an abstract type; leaving the wrapped type out was intentional, as it seemed to make no difference in usage or performance (why would it), so it only adds a useless detail. But, I may have missed something.

EDIT: Something I missed, but you picked up:

x → FF.instance(obj.x)

Using x for two different things here is confusing.

Exactly, that might even be better. For example, this might make it a bit more clear why the field _y exists.

It depends on the C library that you got the pointer from. For example, the HDF5 C library is not thread safe by default. If I attempt to close a HDF5 object from another thread while performing another operation concurrently, it will crash stochastically.

Using finalizers for resource management is really not a great idea. This is a very old issue that has been the subject of much debate.

1 Like

A recent development that has been incorporated in the Julia code base is eager finalizer insertion. Essentially, if we can prove that the object will go out of scope, then we should try to call the finalizer

Yes. I’ll also point out that the field of a RefValue is also x, so tracking x here is particularly fraught.

julia> r = Ref(5)
Base.RefValue{Int64}(5)

julia> r.x
5

julia> r[]
5

The performance hit is in the finalizer which is hard to measure. Essentially, the compiler has no idea what is contained within the Ref, so you will end up getting dynamic dispatch on your free method from the finalizer.

struct Foo
    y::Ref
end
g(foo::Foo) = foo.y[]

julia> @code_warntype g(foo)
MethodInstance for g(::Foo)
  from g(foo::Foo) in Main at REPL[33]:1
Arguments
  #self#::Core.Const(g)
  foo::Foo
Body::Any
1 ─ %1 = Base.getproperty(foo, :y)::Ref
│   %2 = Base.getindex(%1)::Any
└──      return %2

In any case, I think it’s safer (in terms of performance degradation), and easy, to make y inferrable.

I think the question of a canonical way to do this is still interesting. But, it’s clear it shouldn’t be recommended for general use at this point.

Also, A remark in a different comment above:

I think it is a little more idiomatic to follow the convention of Ref and override obj instead of obj()

I thought about that. I thought it might be confusing, that someone would think this is essentially dereferencing a Ref. But, now, I think I agree with you.