How to undef a vector element

Never thought it was. Just wanted to revert to the earlier state, but there appears to be no elegant way of doing that.

What I’ve done is to define something like:

EmptyStuff() = Stuff(0, Vector{Float64}(undef,0), Vector{String}(undef,0))

and use that in the loop

        if j < i 
            # release allocated memory
            s[j] = EmptyStuff()
        end

at least it becomes very clear what you want to do, and the main code does not become bloated.

I thought maybe this code would be more efficient than creating a new empty struct:

empty!(s::Stuff) = (resize!(s.y, 0); resize!(s.z, 0); return nothing)

But it seems not:

julia> @btime sv .= EmptyStuff.() setup=( sv = [Stuff(5, rand(1000), [randstring(20) for _ in 1:1000]) for _ in 1:1000]) evals=1;
  43.600 μs (2000 allocations: 109.38 KiB)
julia> @btime foreach(empty!, sv) setup=( sv = [Stuff(5, rand(1000), [randstring(20) for _ in 1:1000]) for _ in 1:1000]) evals=1;
  57.400 μs (0 allocations: 0 bytes)

Even though the latter requires no allocations. I do wonder if resize! allows memory to be reclaimed. Does anyone know?

1 Like

If it is implemented like C++ Vector I would guess not. The resize! only changes how many positions the Vector has readily available, however, the real memory allocated is often between this value and its double. Because otherwise Julia cannot guarantee the O(1) amortization promise on push!. In C++, I believe resizing down never frees the memory.

3 Likes

Note that this approach creates a new Stuff instance (and on the heap because it’s non-isbits) each time the method is called. It does save memory because the vectors are empty, but do it many times and you’ll have empty vectors and strings scattered all over memory.

You can instead make a const emptyStuff = EmptyStuff() to reuse, and if you can guarantee all “something” Stuff instances do not share its value, it can serve as a specific “nothing” for a Vector{Stuff}. But nothing has more built-in support like isnothing that you’ll have to implement specifically for emptyStuff, and you’ll have to be careful to never ever accidentally mutate the vectors in emptyStuff.

2 Likes

Thanks all for your suggestions. The

const emptystuff = EmptyStuff()

route seems most palatable… except that Stuff in my case is a parametric type with T<:AbstractFloat. Are there now better alternatives than what is suggested in https://discourse.julialang.org/t/can-one-make-a-constant-with-a-parametric-type/39444 ?

Whew, that really complicates things because there needs to be as many versions of emptystuff as there is Stuff{T}. Making an isempty to replace isassigned or isnothing shouldn’t be hard if your Stuff{T} is as simple as in OP, but it’ll be a chore to dispatch to the right emptystuff instance to put in each Vector{Stuff{T}}. Note how even that thread ultimately reaches the suggestion of using nothing, which is far simpler and seems to do the job for your case.

1 Like

Thanks @Benny . Yes, it’s simpler and I’ve used that trick in the past, but it’s ugly. Guess there are good reasons for the design decision not to allow my preferred solution, but ugh.

Like I said, #undefing doesn’t really happen for isbits elements, and making it so would unconditionally add more work that takes more time. So making elementwise #undefing doable isn’t really elegant or consistent.

I do also think Stuff{T} looks more elegant than Union{Stuff{T}, Nothing}, but that’s just on the surface. Making a bunch of emptystuff instances for every concrete Stuff{T} is complicated in its own way, and the elegant solution is to dedicate 1 concrete type to represent nothing, ie Nothing.

The other advantage of using types is clarity. Sure, you’re fine doing isassigned or isempty all the time for this use case, but in cases where you have a fully initialized array, you could elide those checks. But do you really want to comb through entire arrays to check for #undefs or emptystuffs? Stuff{T} versus Union{Stuff{T}, Nothing} immediately tells you which, and you can even annotate methods for each.

2 Likes

There are some very good solutions above in this thread. I personally would just use the Union{Stuff,Nothing} and initialize the array with nothing. Just pretend that nothing is undef.

s = Vector{Union{Stuff,Nothing}}(nothing, B)

Now I’m going to actually answer your question about “how to undef a vector element”. Let me preface this by saying that this is a very, very bad idea. Any code that uses the below deserves a fiery death, and I am not liable for your computer exploding.

OK, now that you’ve been warned. Let’s start with just creating a small array and populating it, leaving one element one defined so we can inspect it later.

julia> s = Vector{Stuff}(undef, 4)
4-element Vector{Stuff}:
 #undef
 #undef
 #undef
 #undef


julia> s[1] = Stuff(1, [], [])
Stuff(1, Float64[], String[])

julia> s[2] = Stuff(2, [3.0], ["a"])
Stuff(2, [3.0], ["a"])

julia> s[3] = Stuff(3, [4.0, 5.0], ["a", "b"])
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> s
4-element Vector{Stuff}:
    Stuff(1, Float64[], String[])
    Stuff(2, [3.0], ["a"])
    Stuff(3, [4.0, 5.0], ["a", "b"])
 #undef

The first part of this is to understand what is the memory layout behind the Vector s. We can ask Julia for a pointer, and it will happily oblige. We can try to load information from the pointer.

julia> pointer(s)
Ptr{Stuff} @0x00007f4b688bfce0

julia> unsafe_load(ptr)
Stuff(1, Float64[], String[])

julia> unsafe_load(ptr, 2)
Stuff(2, [3.0], ["a"])

julia> unsafe_load(ptr, 3)
Stuff(3, [4.0, 5.0], ["a", "b"])

Since we can recover the information from the pointer, we have shown that the pointer is sufficient to retrieve information about objects in the array. I did not load index 4 from the pointer. If you do, you will likely find that Julia will crash. That’s why the function is called unsafe_load. This is the first evidence that we are in dangerous territory. Turn back now!

So you are still reading? Fine. We’ll continue. First, let’s examine the pointer values by seeing how far they are apart.

julia> pointer(s, 2) - pointer(s, 1)
0x0000000000000018

julia> pointer(s, 3) - pointer(s, 2)
0x0000000000000018

julia> pointer(s, 4) - pointer(s, 3)
0x0000000000000018

julia> pointer(s, 4) - pointer(s, 3) |> Int
24

We see that the pointers are spaced 24 bytes apart. Where does that come from? Let’s take a closer look at Stuff.

julia> sizeof(Stuff)
24

julia> fieldoffset(Stuff, 1)
0x0000000000000000

julia> fieldoffset(Stuff, 2)
0x0000000000000008

julia> fieldoffset(Stuff, 3)
0x0000000000000010

We see that the pointer spacing matches the sizeof(Stuff) and the fields are each offset by 8 bytes. I’m going to wildly speculate that what is being stored is an Int64 followed by two pointers.

julia> const RawStuff = Tuple{Int, Ptr{Nothing}, Ptr{Nothing}}
Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}

julia> sizeof(RawStuff)
24

Now I will cast the pointer and try to load RawStuff.

julia> ptr = Ptr{RawStuff}(ptr)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> unsafe_load(ptr, 1)
(1, Ptr{Nothing} @0x00007f4b69541590, Ptr{Nothing} @0x00007f4b695963e0)

julia> unsafe_load(ptr, 2)
(2, Ptr{Nothing} @0x00007f4b6a2a4d10, Ptr{Nothing} @0x00007f4b6a2a4d50)

julia> unsafe_load(ptr, 3)
(3, Ptr{Nothing} @0x00007f4b6a5109c0, Ptr{Nothing} @0x00007f4b6a2b76d0)

julia> unsafe_load(ptr, 4)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> C_NULL
Ptr{Nothing} @0x0000000000000000

I’ve loaded the forbidden 4th element! It’s a Int64(0) followed by two null pointers (e.g. C_NULL). Now we know what undef actually looks like for Stuff.

Let’s try to clear out the memory now, first using the conventional method to let the garbage collector properly know it can reclaim memory.

julia> const emptystuff = Stuff(0, [], [])
Stuff(0, Float64[], String[])

julia> s[1] = emptystuff
Stuff(0, Float64[], String[])

julia> s[2] = emptystuff
Stuff(0, Float64[], String[])

julia> s[3] = emptystuff
Stuff(0, Float64[], String[])

julia> s
4-element Vector{Stuff}:
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
 #undef

julia> unsafe_load(ptr, 1)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 2)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 3)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 4)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

We see that the first three entries are all identical, but they are distinct from the 4th element that represents undef.

julia> const undefstuff = (0, C_NULL, C_NULL)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> undefstuff::RawStuff # type assertion
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> unsafe_store!(ptr, undefstuff, 1)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> s
4-element Vector{Stuff}:
 #undef
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
 #undef

Tada! The first the element is now undef. Let’s do the second and third element.

julia> unsafe_store!(ptr, undefstuff, 2)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> unsafe_store!(ptr, undefstuff, 3)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> s
4-element Vector{Stuff}:
 #undef
 #undef
 #undef
 #undef

Now s has returned to its prestine undef state. Doing what I just did is a bad idea!! We use a bunch of unsafe methods to do it. The details of how this works may change at any point. Do not do this.

8 Likes

Now I’m going to describe what happens when you use Union{Stuff, Nothing} as the element type of the array. Let’s create the array, and probe the pointers.

julia> s = Vector{Union{Stuff,Nothing}}(nothing, 5)
5-element Vector{Union{Nothing, Stuff}}:
 nothing
 nothing
 nothing
 nothing
 nothing

We can then populate the first three elements of the array once again.

julia> s[1] = Stuff(1, [], [])
Stuff(1, Float64[], String[])

julia> s[2] = Stuff(2, [3.0], ["a"])
Stuff(2, [3.0], ["a"])

julia> s[3] = Stuff(3, [4.0, 5.0], ["a", "b"])
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> s
5-element Vector{Union{Nothing, Stuff}}:
 Stuff(1, Float64[], String[])
 Stuff(2, [3.0], ["a"])
 Stuff(3, [4.0, 5.0], ["a", "b"])
 nothing
 nothing

Probing the pointers, we see they are now spaced 8 bytes apart rather than 24 bytes apart. We also cannot load the pointer directly.

julia> pointer(s, 2) - pointer(s, 1)
0x0000000000000008

julia> pointer(s, 3) - pointer(s, 2)
0x0000000000000008

julia> pointer(s, 4) - pointer(s, 3)
0x0000000000000008

julia> ptr = pointer(s)
Ptr{Union{Nothing, Stuff}} @0x00007f9304503518

julia> unsafe_load(ptr)
ERROR: pointerref: invalid pointer type

I’m going to speculate that the pointer points to another pointer. Let’s see what happens.

julia> ptr = Ptr{Ptr{Nothing}}(ptr)
Ptr{Ptr{Nothing}} @0x00007f9304503518

julia> unsafe_load(ptr, 1)
Ptr{Nothing} @0x00007f93047b15d0

julia> unsafe_load(ptr, 2)
Ptr{Nothing} @0x00007f93047c8f10

julia> unsafe_load(ptr, 3)
Ptr{Nothing} @0x00007f93047e4a10

julia> unsafe_load(ptr, 4)
Ptr{Nothing} @0x00007f9307e17008

julia> unsafe_load(ptr, 5)
Ptr{Nothing} @0x00007f9307e17008

Hmm… the last two elements, which are both nothing are a pointer to the same thing, while the first three differ! Let’s see if we can load the first three elements.

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 1)))
Stuff(1, Float64[], String[])

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 2)))
Stuff(2, [3.0], ["a"])

julia> unsafe_load(Ptr{Stuff}(unsafe_load(ptr, 3)))
Stuff(3, [4.0, 5.0], ["a", "b"])

Do not try to load elements 4 and 5 as Stuff! Julia will crash.

Julia must know what is Stuff and what is nothing.

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 1))
Stuff(1, Float64[], String[])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 2))
Stuff(2, [3.0], ["a"])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 3))
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 4))

julia> unsafe_pointer_to_objref(unsafe_load(ptr, 5))

We did not need to tell Julia that elements 1, 2, and 3 were Stuff and that elements 4 and 5 were nothing. Here’s a hint about how that works:

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 1) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 2) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 3) - 8))
Ptr{Nothing} @0x00007f74acef7442

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 4) - 8))
Ptr{Nothing} @0x00007f74a1d132c3

julia> unsafe_load(Ptr{Ptr{Nothing}}(unsafe_load(ptr, 5) - 8))
Ptr{Nothing} @0x00007f74a1d132c3

Also note that

julia> sizeof(nothing)
0
4 Likes

Note that this is an implementation detail and only happens to work in this exact case. Other cases may not have a “clean” C_NULL as #undef or may be tracked differently. There may not even be an #undef value.

Returning an array element to #undef will never be a well defined operation in julia.

4 Likes

Wow, thanks @mkitti ! That’s awesome.

What makes some of the detailed posts above (including yours) so helpful is that they contain a detailed explanation of the machinery instead of a recipe. Thanks to all.

I don’t think you have to worry that anyone will actually try to implement that, other than for experimentation.

Thanks @Benny for your answers! It’s a bit more than one looking nicer than the other. The former is also less maintenance and hassle than the latter.

There is an internal Base._unsetindex function for that:

julia> x = [1, 'b', "c"]; Base._unsetindex!(x, 2)
3-element Vector{Any}:
   1
 #undef
    "c"

It’s used for example to implement deleteat!.

2 Likes

Wow, I did not know that!

I cannot emphasize enough that one really should not use internal functions, especially those that start with and underscore, or rely on this current memory layout.

1 Like

I posted this before: I think Base should export and document an “unset” function. This is essential for implementing hash tables. An open-addressing hash table stores keys and values in underlying arrays. Deleting a mutable key or value requires you to “unset” an index of an array. This is used in Base duct.jl as well as 3rd-party hash table packages like Dictionaries.jl and OrderedCollections.jl. So all these community packages have to use Julia’s internal details to implement dictionary-like data structures. This is a little unsettling.

3 Likes

That sounds like a compelling case. Is there a Github issue about this?

Haven’t had time to write up an issue yet. I found this when tracking down how delete!() is implemented for Dict. Maybe this is considered by devs to be too low-level to be exported to general users.