Nulling of object references / unset!

When interfacing to C code, it is often necessary to unset variables, i.e. reset object references to C_NULL before passing then as structs to C.

This can be done by

@inline function unset_field!(x::T)
unsafestore!(convert(Ptr{Ptr{Void}}, pointer_from_objref(x)+offset), C_NULL, 1 )
end

This compiles down to the correct 1-2 native instructions. However, this is not really convenient, as I need to compute the correct offset into the struct by hand, and write a new function for every case.

Any better solutions involving macros? The offsets really must be computed at compile time, but I am happy to segfault on misuse.

In fact, I would really like to get support for unset!. Every non-bitstype is effectively C_NULL-able anyway (uninitialized array), so type-safety is not an issue here. Make it convenient to use.

Since the Julia compiler assumes this operation is not possible and optimizes according, it’s best to find other solutions (such as using a nullable Union{T, Void} field type, or declaring the field as Ptr{Void}) that won’t cause codegen to emit bad code.

What are the conditions where codegen would emit bad code? Do you have a minimal example where julia emits bad machine code?

Are you saying that isinitialized-checks could be optimized away, due to assumptions that are valid in julia’s type system but invalid on the llvm-level? I thought that the optimization pass that would consider optimizing these checks away has access to the llvm-code only, and hence cannot make such mistakes. [edit: This would be really, really scary, and I don’t understand Julia’s internals well enough to be sufficiently certain. Please tell me that such optimizations cannot happen?]

I naively assumed that most internal machinery of julia (especially the garbage collector and type inference) is capable of dealing with uninitialized references (null-pointers) anyway, since these can occur often (e.g. in arrays, partial constructors, etc).

Null-pointers for uninitialized object-references have the extremely tempting properties of being binary compatible with C-code that expects NULL-pointers for missing values (which almost all C and C++ code does), and being quite type-stable (type-inference will never box anything due to possible uninitialized references).

Unfortunately, “nothing” is not represented by a null-pointer in Union{T, Void}, where T is a non-bitstype. Hence, Union{T,Void} always generates different code from the unsafe_store!-deinitialization, and is not binary compatible with the usual C constructs. Using Julia object references instead of pointers in fields give all the nice advantages of memory management by the garbage collector, and play very nice with the standard julia libraries.

An extremely simple example where explicit nulling/deinitialization makes sense, even without C compatibility:

Take the deque in the DataStructures package. They use a doubly linked list of blocks; each block contains a Vector{T}(capacity), and stores the interval (front, back) of valid entries. Assume that T is not bitstype. Initially, all entries are uninitialized (null-pointers); upon removal of an entry, front is incremented or back is decremented. However, the “removed” entries are still visible to the garbage collector and hence kept alive! (they get released when the entire block is destroyed, which might be pretty late)

In order to get rid of them, one would need to overwrite the reference with something. However, deque knows no constructor for a null-object of type T. There is such a constructor (put in a null-pointer if reference type, use uninitialized memory / NOP for bitstype), which is what allows us to write Vector{T}(capacity) in the first place; I was complaining about the lacking availability of this null-constructor for general purpose operations.

You could argue that one should use Union{Void, T} for such cases; however, the fact that even the almost-std-lib people from DataStructures.jl could not be bothered to do this tells us something about the elegance of such an approach (again, using the requirement that this should have zero runtime overhead if types and @inbounds are correctly annotated).

You could also argue that we should modify the underlying array (delete from it). This has [edit: maybe, maybe not, maybe rarely, see youyichao’s comment, who knows much more about this] the unfortunate consequence of occasionally causing memmoves if we delete from the front, and preventing them is the entire point of the deque. The thresholds for memmoving the array contents upon deletion from front are compile-time constants (macros) in array.c, so there is no reasonable way of fiddling with them for the deque, if it is to be written in julia instead of C.

Please dont spread wrong info about deleting from arrays anymore!!! They don’t happen frequently.

That said, nulling array element is perfectly fine. Objects are completely different.

Ok, thanks! Do you know an example/situation where nulling of object references might not be OK?

Regarding deletion from front of arrays causing memmoves because offset exceeds the threshold: Sorry, I’ll try not to talk about this anymore, and try not to spread false info. I thought that this had been the design consideration behind the datastructure-deque’s decision to store front/back in julia instead of leaving it to the array.c implementation (guarantee that pointers stay valid and never get moved).

The current implementation takes advantage of fields that are always initialized in the constructor so violating that will certainly be an issue. I don’t think there’s much in the runtime that’ll cause issue if you null a reference that’s possible to construct but that’s definately not guaranteed and breakage of that can happen at any time. (i.e. it current doesn’t cause an issue only because the compiler is not smart enough to expoide it).

It would certainly be nice if storing a NULL pointer is something that we can take advantage of, not so much for C compatibility (mutating object reference is already not allowed, reading is though) but that NULL check is a operation that the whole stack (from compiler to hardware) are optimized for. A Union{...,Void} (or any singleton type in place of the Void) should be able to achieve that but we are certainly not there yet.

I see. So I should just define a never-called dummy inner constructor for T that leaves the C_NULL-able field uninitialized in order to tell the compiler/type inferer that the field might be null?

This would induce the overhead of possibly slower compilation/type inference, and possibly prevent removal of some null-pointer checks in future versions of julia that are smarter about propagating invariants from inner constructors, and should produce identical code in current julia versions?

No it can crash in future version.