Can I mimick finalizing for immutables / is an immutable file handler unfeasible?

I can’t think of other uses for destructors/finalizers in a language without manual deallocations, so I’m using file-closing as an example. It seems to me that I could implement open to open a file and construct an immutable file descriptor, then implement a close to use the file descriptor to close the file. However, my concern is that the file descriptor isn’t handled by the garbage collector, so unlike the typical IOStreams with finalizer(close, x), the file won’t be automatically closed when the descriptor doesn’t outlive a scope. Granted, I could just construct it again to close, but I would need to remember the value. For the sake of the thread, let’s assume an automatic close upon deallocation would help because I am forgetful enough to neglect using higher order functions that automatically handle setup and cleanup, like open’s other method for the open-close pattern.

More thought: Immutable semantics allow several unmonitored copies to represent the same instance, which is notably different from typical languages with manual memory management and destructors. Whatever mimicry of automatic finalization cannot execute for each copy, which sounds impossible in the general case for something that isn’t GC-tracked.

1 Like

If your immutable type has a mutable component, you could add a finalizer to that.

julia> struct SpecialFileHandle                                                        
           fileid::Int
           ref::Base.RefValue{SpecialFileHandle}
           function SpecialFileHandle(fileid)
               self = new(fileid, Ref{SpecialFileHandle}())
               self.ref[] = self
               finalizer(r->close(r[]), self.ref)
               return self
           end
       end

julia> Base.close(r::SpecialFileHandle) = @async println("Closing file handle $(r.fileid)")

julia> h = SpecialFileHandle(3)
SpecialFileHandle(3, Base.RefValue{SpecialFileHandle}(SpecialFileHandle(#= circular reference @-2 =#)))

julia> GC.gc()                                                              
julia> h = nothing

julia> GC.gc()
Closing file handle 3

julia> h = nothing
1 Like

This is not a valid use of finalizers. Julia is allowed to run finalizers as soon as the object it’s attached to goes out of scope. Since you are attaching the finalizer just to the RefValue, it is perfectly valid for Julia to close the file immediately after a SpecialFileHandler is created unless h.ref[] is explicitly referenced somewhere later. It just happens to work in global scope since h.ref never goes out of scope.

2 Likes

If I have a reference to h, why would h.ref be marked for finalization and garbage collected?

If you actually hold a GC-tracked reference to h – as is the case here due to the return value being leaked as a global binding – h.ref is indeed rooted as well and won’t be finalized until h escapes. As soon as you use this pattern in a situation where h doesn’t escape though, there’s no guarantee this is the case anymore. h.ref’s lifetime ends as soon as h.ref is not used anymore, regardless of h.fileid still being referenced later on.

Julia will optimize out allocations of immutable objects quite aggressively since avoiding GC interactions in hot code paths is essential for good performance. When the finalizer will actually be run then depends on the presence of GC safepoints and internal heuristics, in some cases finalizers can even be inlined now.

2 Likes

How isn’t this valid? The finalization should happen if there aren’t any active references to the overall SpecialFileHandle instance, even if you assigned (and copied) the fileid integer to something. This is the case for mutable structs too:

julia> mutable struct Reffed{T}  val::T  end

julia> kv = let
         k = Reffed(1)
         finalizer(x -> println("Finalized!"), k)
         k.val
       end
1

julia> GC.gc()
Finalized!

julia> kv
1

Even if the compiler splits the SpecialFileHandle instance apart by field for some optimizations, I expect that while the program semantically has a reference to the overall instance, the .ref field remains live.

This definitely confuses me, if a SpecialFileHandler instance is still live, why would the .ref field be collected and finalized? That’d just be a dangling pointer?

If anything, I’d see the risk as the .ref field being assigned somewhere I’m not aware of, so the file becomes nigh-uncloseable when the overall instance goes out of scope.

The field in Reffed is not the same as the value contained within. The field chiefly belongs to k, which is out of scope (and can thus be finalized) after the let block finishes execution. You’re not returning the field k.val from the let block, but the value that field is referring to (which is immutable and doesn’t have any finalizer attached to it).

Yes, that’s what I mean, and the behavior makes sense to me.

What Simeon is talking about is that attaching a finalizer to the .ref field does NOT mean that SpecialFileHandler sticks around. It’s valid for the compiler to “split” the struct up into multiple smaller segments if it knows that only part of it are referenced - and in this case in particular it’s allowed to do that because SpecialFileHandle is immutable. So if you only ever use .fileid after constructing SpecialFileHandle and the struct is split up, the compiler is free to immediately call the finalizer attached to the Ref stored in .ref, because .ref itself is never used; there is no semantic construct linking the lifetimes of .fileid and .ref (because immutable structs are defined by their bits, not their addresses, and you can slice them up however you want as long as you know how to put them back together, put bluntly).

In the case brought up by @mkitti, the parent object SpecialFileHandle leaks to global scope, so of course the entire object needs to stick around in some form or another. That’s not going to be the case in more complicated situations though, where it can be split up.

Isn’t this fine? The compiler seems pretty conservative about labelling things as “never used”, so if a function ever does h = SpecialFileHandle(3); read(h, String) it won’t assume h.ref is not used.

No; this is a bug waiting to happen. A conservative compiler today removes your “fine” code tomorrow; if you need to establish this relationship, best to keep it alive explicitly.

That depends on how read is implemented; if it only uses h.fileid and doesn’t touch the h.ref, the compiler is free to finalize the Ref precisely because it’s never used. It may even elide the allocation of that object completely if the field is never ever read from in the first place.

This topic has actually also come up on the last triage call - finalizers are the completely wrong place to ensure that an object such as an open file is closed or flushed. The finalizers are not guaranteed to run; it’s not valid for you to rely on them running for correctness of your program. As an example, a long running web server that doesn’t allocate won’t ever run GC.gc() - and hence, finalizers won’t run.


Bit of an aside, but this finalizer actually has another bug; you’re not allowed to println in a finalizer because that can cause a task switch. You’re not allowed to do that.

1 Like

Oh dang. Definitely if inlined.

Oh that’s right, all the long zero-allocation loops we strive for will just let these open files gather dust.

So, why are file handlers still given finalizers that close files? Just can’t get rid of it for backwards compatibility?

Is there anything that a finalizer should do if it’s not guaranteed to ever run because the GC isn’t either?

Yes, though even if not inlined; a sufficiently smart compiler could propagate the information that only a part of the argument actually needs to be passed back into the caller. Julia doesn’t do that at the moment as far as I know, but it’s valid for it to do that.

Are they? If so, I think it’s mostly as a non-guaranteed fallback if a developer forgets it. A best-effort is maybe a bit better than doing nothing at all (not that I personally think that’s good, as it hides bugs).

That’s a good question - probably only some informational things that you don’t require for correctness of your program.

Pretty much the only thing that should be done in a finalizer is calling free on memory that isn’t owned by Julia. For example, BigInts need a finalizer to free the underlying memory (because GMP calls malloc directly rather than letting Julia hijack it to use its own allocator directly)

2 Likes

Even that can lead to a memory leak in the case that GC never runs. Memory allocated like that should be freed explicitly when the semantics of the program allows it to be (I’m aware that this is lacking specifics). I think BigInt only does it this way because the finalizer is the languages own mechanism for dealing with its memory (and similar to the IO case above, is the only realistic way to implement a sort of fallback).

2 Likes

Yeah, see the IOStream constructors in iostream.jl.

My guess is that it won’t be a problem. If memory is being depleted and triggers the GC, each garbage BigInt instance will be collected and the associated finalizer will free the memory the GC doesn’t know about. If not enough memory is being used to trigger the GC, leaving that memory unfreed until the process is over doesn’t affect the program.

File handling came to mind as an example because a file staying open when you don’t need it anymore actually disrupts the program, like when it tries to open the file again.

1 Like

Good points. Just a few comments to collect my understanding

  1. Finalizers hook into the GC – which is concerned about memory as a resource – and thus any memory that an object holds outside of the GC must be freed as well when the object gets garbage collected, i.e., the finalizer is the right spot for that.

  2. Any other resources – besides memory – are not the concern of the GC. Yet, when an object gets garbage collected and does not free other resources it held, these would leak, i.e., the finalizer is your hook (of last resort) to free those as well.

    On the other hand, a correct program cannot rely on the GC to handle such resources – as the GC is only concerned with memory and only guarantees to run when memory becomes tight. Thus, a proper program needs to handle non-memory resources itself, e.g., using open(...) do handle ... end or similar.