How to undef a vector element

Suppose that I create an uninitialized vector of objects of type Stuff and then assign some content to it. At some point in the future, I no longer need a particular element and want the garbage collector to free up the corresponding memory. I can achieve that like I do in the silly example below, but it is clunky. I would prefer to simply make the element undefined again. How would I achieve that?

struct Stuff
    x   ::  Int
    y   ::  Vector{ Float64 }
    z   ::  Vector{ String }
end


const B = 1_000_000

function dome()
    s = Vector{ Stuff }( undef, B )
    for i ∈ 1 : B
        s[i] = Stuff( 1, randn( 100 ), fill( "what do I do?", 13 ) ) 
        j = rand( 1:B )
        if j < i
            # release allocated memory
            s[j] = Stuff( 0, Float64[], String[] )    # I really just want to stick an undef in here
        end
    end
end


dome()

Maybe make const empty_stuff = Stuff(0, Float64[], String[]) and assign that to released locations?

Thanks; sure, but that’s the clunkiness I’m trying to avoid.

You could also have s = Vector{Union{Nothing,Stuff}}(undef, B) and then assign s[j] = nothing. In my mind that makes the type more clunky, but maybe you’re okay with that.

1 Like

Thanks @StevenWhitaker . That’s indeed not what I’m looking for. I was hoping to simply have something that behaves like undef in every way, e.g. I can call isassigned on it.

Still clunky, but you could create a vector of Ref and ask if the Ref isassigned:

function dome()
    s = fill(Ref{Stuff}(), B)
    for i ∈ 1 : B
        s[i] = Ref(Stuff( 1, randn( 100 ), fill( "what do I do?", 13 ) ))
        j = rand( 1:B )
        if j < i && isassigned(s[j])
            # release allocated memory
            s[j] = Ref{Stuff}()
        end
    end
end
1 Like

Nice answer. It’s apparently not as straightforward as I’d hoped / expected.

It’s not straightforward by design. #undef elements weren’t even instances that are created, undef is just a setting to not spend time creating instances when allocating an array. Given an non-isbits element type, you get something sensible; every element is #undef, and isassigned returns false. This sanity check makes sure indexing gets to an actual instance instead of segfaulting.

But given an isbits element type, every element is a random value interpreted from the pre-existing bits in the memory allocated for the array, and isassigned returns true. Indexing doesn’t need to jump across memory, so the sanity check doesn’t happen. Code like yours will not be adaptable to isbits elements because there’s no way to tell if an element was assigned or #undef just from the value.

So, an actual instance nothing exists for consistency. The type needs to be a Union{T, Nothing}, but small unions are well-optimized. You allocate an array of nothings with s = Vector{Union{Stuff, Nothing}}(nothing, B), and you check the element with isnothing(s[i]). You can assign nothing to an element to unambiguously express there is “nothing” there, and when the garbage collector is triggered, it will free the unassigned Stuff instances with no more references.

11 Likes

#undef is not a value

2 Likes

Thanks for the explanation, @Benny

Never thought it was. Just wanted to revert to the earlier state, but there appears to be no elegant way of doing that.

What I’ve done is to define something like:

EmptyStuff() = Stuff(0, Vector{Float64}(undef,0), Vector{String}(undef,0))

and use that in the loop

        if j < i 
            # release allocated memory
            s[j] = EmptyStuff()
        end

at least it becomes very clear what you want to do, and the main code does not become bloated.

I thought maybe this code would be more efficient than creating a new empty struct:

empty!(s::Stuff) = (resize!(s.y, 0); resize!(s.z, 0); return nothing)

But it seems not:

julia> @btime sv .= EmptyStuff.() setup=( sv = [Stuff(5, rand(1000), [randstring(20) for _ in 1:1000]) for _ in 1:1000]) evals=1;
  43.600 Îźs (2000 allocations: 109.38 KiB)
julia> @btime foreach(empty!, sv) setup=( sv = [Stuff(5, rand(1000), [randstring(20) for _ in 1:1000]) for _ in 1:1000]) evals=1;
  57.400 Îźs (0 allocations: 0 bytes)

Even though the latter requires no allocations. I do wonder if resize! allows memory to be reclaimed. Does anyone know?

1 Like

If it is implemented like C++ Vector I would guess not. The resize! only changes how many positions the Vector has readily available, however, the real memory allocated is often between this value and its double. Because otherwise Julia cannot guarantee the O(1) amortization promise on push!. In C++, I believe resizing down never frees the memory.

3 Likes

Note that this approach creates a new Stuff instance (and on the heap because it’s non-isbits) each time the method is called. It does save memory because the vectors are empty, but do it many times and you’ll have empty vectors and strings scattered all over memory.

You can instead make a const emptyStuff = EmptyStuff() to reuse, and if you can guarantee all “something” Stuff instances do not share its value, it can serve as a specific “nothing” for a Vector{Stuff}. But nothing has more built-in support like isnothing that you’ll have to implement specifically for emptyStuff, and you’ll have to be careful to never ever accidentally mutate the vectors in emptyStuff.

2 Likes

Thanks all for your suggestions. The

const emptystuff = EmptyStuff()

route seems most palatable… except that Stuff in my case is a parametric type with T<:AbstractFloat. Are there now better alternatives than what is suggested in https://discourse.julialang.org/t/can-one-make-a-constant-with-a-parametric-type/39444 ?

Whew, that really complicates things because there needs to be as many versions of emptystuff as there is Stuff{T}. Making an isempty to replace isassigned or isnothing shouldn’t be hard if your Stuff{T} is as simple as in OP, but it’ll be a chore to dispatch to the right emptystuff instance to put in each Vector{Stuff{T}}. Note how even that thread ultimately reaches the suggestion of using nothing, which is far simpler and seems to do the job for your case.

1 Like

Thanks @Benny . Yes, it’s simpler and I’ve used that trick in the past, but it’s ugly. Guess there are good reasons for the design decision not to allow my preferred solution, but ugh.

Like I said, #undefing doesn’t really happen for isbits elements, and making it so would unconditionally add more work that takes more time. So making elementwise #undefing doable isn’t really elegant or consistent.

I do also think Stuff{T} looks more elegant than Union{Stuff{T}, Nothing}, but that’s just on the surface. Making a bunch of emptystuff instances for every concrete Stuff{T} is complicated in its own way, and the elegant solution is to dedicate 1 concrete type to represent nothing, ie Nothing.

The other advantage of using types is clarity. Sure, you’re fine doing isassigned or isempty all the time for this use case, but in cases where you have a fully initialized array, you could elide those checks. But do you really want to comb through entire arrays to check for #undefs or emptystuffs? Stuff{T} versus Union{Stuff{T}, Nothing} immediately tells you which, and you can even annotate methods for each.

2 Likes

There are some very good solutions above in this thread. I personally would just use the Union{Stuff,Nothing} and initialize the array with nothing. Just pretend that nothing is undef.

s = Vector{Union{Stuff,Nothing}}(nothing, B)

Now I’m going to actually answer your question about “how to undef a vector element”. Let me preface this by saying that this is a very, very bad idea. Any code that uses the below deserves a fiery death, and I am not liable for your computer exploding.

OK, now that you’ve been warned. Let’s start with just creating a small array and populating it, leaving one element one defined so we can inspect it later.

julia> s = Vector{Stuff}(undef, 4)
4-element Vector{Stuff}:
 #undef
 #undef
 #undef
 #undef


julia> s[1] = Stuff(1, [], [])
Stuff(1, Float64[], String[])

julia> s[2] = Stuff(2, [3.0], ["a"])
Stuff(2, [3.0], ["a"])

julia> s[3] = Stuff(3, [4.0, 5.0], ["a", "b"])
Stuff(3, [4.0, 5.0], ["a", "b"])

julia> s
4-element Vector{Stuff}:
    Stuff(1, Float64[], String[])
    Stuff(2, [3.0], ["a"])
    Stuff(3, [4.0, 5.0], ["a", "b"])
 #undef

The first part of this is to understand what is the memory layout behind the Vector s. We can ask Julia for a pointer, and it will happily oblige. We can try to load information from the pointer.

julia> pointer(s)
Ptr{Stuff} @0x00007f4b688bfce0

julia> unsafe_load(ptr)
Stuff(1, Float64[], String[])

julia> unsafe_load(ptr, 2)
Stuff(2, [3.0], ["a"])

julia> unsafe_load(ptr, 3)
Stuff(3, [4.0, 5.0], ["a", "b"])

Since we can recover the information from the pointer, we have shown that the pointer is sufficient to retrieve information about objects in the array. I did not load index 4 from the pointer. If you do, you will likely find that Julia will crash. That’s why the function is called unsafe_load. This is the first evidence that we are in dangerous territory. Turn back now!

So you are still reading? Fine. We’ll continue. First, let’s examine the pointer values by seeing how far they are apart.

julia> pointer(s, 2) - pointer(s, 1)
0x0000000000000018

julia> pointer(s, 3) - pointer(s, 2)
0x0000000000000018

julia> pointer(s, 4) - pointer(s, 3)
0x0000000000000018

julia> pointer(s, 4) - pointer(s, 3) |> Int
24

We see that the pointers are spaced 24 bytes apart. Where does that come from? Let’s take a closer look at Stuff.

julia> sizeof(Stuff)
24

julia> fieldoffset(Stuff, 1)
0x0000000000000000

julia> fieldoffset(Stuff, 2)
0x0000000000000008

julia> fieldoffset(Stuff, 3)
0x0000000000000010

We see that the pointer spacing matches the sizeof(Stuff) and the fields are each offset by 8 bytes. I’m going to wildly speculate that what is being stored is an Int64 followed by two pointers.

julia> const RawStuff = Tuple{Int, Ptr{Nothing}, Ptr{Nothing}}
Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}

julia> sizeof(RawStuff)
24

Now I will cast the pointer and try to load RawStuff.

julia> ptr = Ptr{RawStuff}(ptr)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> unsafe_load(ptr, 1)
(1, Ptr{Nothing} @0x00007f4b69541590, Ptr{Nothing} @0x00007f4b695963e0)

julia> unsafe_load(ptr, 2)
(2, Ptr{Nothing} @0x00007f4b6a2a4d10, Ptr{Nothing} @0x00007f4b6a2a4d50)

julia> unsafe_load(ptr, 3)
(3, Ptr{Nothing} @0x00007f4b6a5109c0, Ptr{Nothing} @0x00007f4b6a2b76d0)

julia> unsafe_load(ptr, 4)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> C_NULL
Ptr{Nothing} @0x0000000000000000

I’ve loaded the forbidden 4th element! It’s a Int64(0) followed by two null pointers (e.g. C_NULL). Now we know what undef actually looks like for Stuff.

Let’s try to clear out the memory now, first using the conventional method to let the garbage collector properly know it can reclaim memory.

julia> const emptystuff = Stuff(0, [], [])
Stuff(0, Float64[], String[])

julia> s[1] = emptystuff
Stuff(0, Float64[], String[])

julia> s[2] = emptystuff
Stuff(0, Float64[], String[])

julia> s[3] = emptystuff
Stuff(0, Float64[], String[])

julia> s
4-element Vector{Stuff}:
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
 #undef

julia> unsafe_load(ptr, 1)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 2)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 3)
(0, Ptr{Nothing} @0x00007f4b6981ce10, Ptr{Nothing} @0x00007f4b69511810)

julia> unsafe_load(ptr, 4)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

We see that the first three entries are all identical, but they are distinct from the 4th element that represents undef.

julia> const undefstuff = (0, C_NULL, C_NULL)
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> undefstuff::RawStuff # type assertion
(0, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)

julia> unsafe_store!(ptr, undefstuff, 1)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> s
4-element Vector{Stuff}:
 #undef
    Stuff(0, Float64[], String[])
    Stuff(0, Float64[], String[])
 #undef

Tada! The first the element is now undef. Let’s do the second and third element.

julia> unsafe_store!(ptr, undefstuff, 2)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> unsafe_store!(ptr, undefstuff, 3)
Ptr{Tuple{Int64, Ptr{Nothing}, Ptr{Nothing}}} @0x00007f4b688bfce0

julia> s
4-element Vector{Stuff}:
 #undef
 #undef
 #undef
 #undef

Now s has returned to its prestine undef state. Doing what I just did is a bad idea!! We use a bunch of unsafe methods to do it. The details of how this works may change at any point. Do not do this.

9 Likes