How to undef a vector element

Now I’m going to describe what happens when you use Union{Stuff, Nothing} as the element type of the array. Let’s create the array, and probe the pointers.

julia> s = Vector{Union{Stuff,Nothing}}(nothing, 5)
5-element Vector{Union{Nothing, Stuff}}:
 nothing
 nothing
 nothing
 nothing
 nothing

We can then populate the first three elements of the array once again.

julia> s[1] = Stuff(1, [], [])
Stuff(1, Float64[], String[])

julia> s[2] = Stuff(2, [3.0], ["a"])
Stuff(2, [3.0], ["a"])

julia> s[3] = Stuff(3, [4.0, 5.0], ["a", "b"])
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> s
5-element Vector{Union{Nothing, Stuff}}:
 Stuff(1, Float64[], String[])
 Stuff(2, [3.0], ["a"])
 Stuff(3, [4.0, 5.0], ["a", "b"])
 nothing
 nothing

Probing the pointers, we see they are now spaced 8 bytes apart rather than 24 bytes apart. We also cannot load the pointer directly.

julia> pointer(s, 2) - pointer(s, 1)
0x0000000000000008

julia> pointer(s, 3) - pointer(s, 2)
0x0000000000000008

julia> pointer(s, 4) - pointer(s, 3)
0x0000000000000008

julia> ptr = pointer(s)
Ptr{Union{Nothing, Stuff}} @0x00007f9304503518

julia> unsafe_load(ptr)
ERROR: pointerref: invalid pointer type

I’m going to speculate that the pointer points to another pointer. Let’s see what happens.

julia> ptr = Ptr{Ptr{Nothing}}(ptr)
Ptr{Ptr{Nothing}} @0x00007f9304503518

julia> unsafe_load(ptr, 1)
Ptr{Nothing} @0x00007f93047b15d0

julia> unsafe_load(ptr, 2)
Ptr{Nothing} @0x00007f93047c8f10

julia> unsafe_load(ptr, 3)
Ptr{Nothing} @0x00007f93047e4a10

julia> unsafe_load(ptr, 4)
Ptr{Nothing} @0x00007f9307e17008

julia> unsafe_load(ptr, 5)
Ptr{Nothing} @0x00007f9307e17008

Hmm… the last two elements, which are both nothing are a pointer to the same thing, while the first three differ! Let’s see if we can load the first three elements.

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 1)))
Stuff(1, Float64[], String[])

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 2)))
Stuff(2, [3.0], ["a"])

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 3)))
Stuff(3, [4.0, 5.0], ["a", "b"])

Do not try to load elements 4 and 5 as Stuff! Julia will crash.

Julia must know what is Stuff and what is nothing.

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 1))
Stuff(1, Float64[], String[])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 2))
Stuff(2, [3.0], ["a"])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 3))
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 4))

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 5))

We did not need to tell Julia that elements 1, 2, and 3 were Stuff and that elements 4 and 5 were nothing. Here’s a hint about how that works:

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 1) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 2) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 3) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 4) - 8))
Ptr{Nothing} @0x00007f74a1d132c3

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 5) - 8))
Ptr{Nothing} @0x00007f74a1d132c3

Also note that

julia> sizeof(nothing)
0
4 Likes

Note that this is an implementation detail and only happens to work in this exact case. Other cases may not have a “clean” C_NULL as #undef or may be tracked differently. There may not even be an #undef value.

Returning an array element to #undef will never be a well defined operation in julia.

4 Likes

Wow, thanks @mkitti ! That’s awesome.

What makes some of the detailed posts above (including yours) so helpful is that they contain a detailed explanation of the machinery instead of a recipe. Thanks to all.

I don’t think you have to worry that anyone will actually try to implement that, other than for experimentation.

Thanks @Benny for your answers! It’s a bit more than one looking nicer than the other. The former is also less maintenance and hassle than the latter.

There is an internal Base._unsetindex function for that:

julia> x = [1, 'b', "c"]; Base._unsetindex!(x, 2)
3-element Vector{Any}:
   1
 #undef
    "c"

It’s used for example to implement deleteat!.

2 Likes

Wow, I did not know that!

I cannot emphasize enough that one really should not use internal functions, especially those that start with and underscore, or rely on this current memory layout.

1 Like

I posted this before: I think Base should export and document an “unset” function. This is essential for implementing hash tables. An open-addressing hash table stores keys and values in underlying arrays. Deleting a mutable key or value requires you to “unset” an index of an array. This is used in Base duct.jl as well as 3rd-party hash table packages like Dictionaries.jl and OrderedCollections.jl. So all these community packages have to use Julia’s internal details to implement dictionary-like data structures. This is a little unsettling.

3 Likes

That sounds like a compelling case. Is there a Github issue about this?

Haven’t had time to write up an issue yet. I found this when tracking down how delete!() is implemented for Dict. Maybe this is considered by devs to be too low-level to be exported to general users.

It definitely looks like it should be unsafe.