Approaches to sentinel values in immutables

What are some approaches to define sentinel values in immutables without introducing type instabilities? Am asking for DynamicExpressions.jl to try to reduce allocations in the branch allowing n-degree nodes.

Some context: I try to avoid Union{T,Nothing} for possibly-defined values in immutables, because of the potential for cascading type instabilities when union splitting fails. I have adopted a few patterns to avoid this, one being to explicitly store a field that marks a value as defined, like:

struct A{T}
    x::T
    isdefined::Bool
end

I will then explicitly check a.isdefined before accessing x. While a little bit less convenient, this seems to be same speed as an good compiler-inlined union splitting, while saving us from the potential cascade of type instabilities. Since, if not careful, it is really easy to get hit by a performance problem using unions:

julia> struct B{T}
           x::NTuple{3,Union{T,Nothing}}
       end

julia> # Sum over Vector{B{T}}
       bar(bs) = sum(bs) do b
           @something(b.x[1], 0.0) + @something(b.x[2], 0.0) + @something(b.x[3], 0.0)
       end;

julia> @btime bar(bs) setup=(bs =  [B{Float64}(ntuple(_ -> rand(Bool) ? randn() : nothing, 3)) for _ in 1:100_000]);
  4.688 ms (149141 allocations: 2.28 MiB)

Compare this to the isdefined version:

julia> struct A{T}
           x::NTuple{3,T}
           isdefined::NTuple{3,Bool}
       end

julia> # Sum over Vector{A{T}}
       foo(as) = sum(as) do a
           (a.isdefined[1] ? a.x[1] : 0.0) + (a.isdefined[2] ? a.x[2] : 0.0) + (a.isdefined[3] ? a.x[3] : 0.0)
       end;

julia> @btime foo(as) setup=(as =  [A((randn(3)...,), (rand(Bool, 3)...,)) for _ in 1:100_000]);
  94.708 ÎĽs (0 allocations: 0 bytes)

So it is about a 50x difference just based on using an explicit mask field.

The main issue with this is that I don’t know how to generically initialise “undefined” values in immutables. How should one define that for arbitrary types? This can be done in in mutables as you can just leave it #undef. But, what about immutables?

For example, take a recursive struct with children held in a tuple (the tuple being the immutable of interest):

# Defines a tree with up to 3 children
mutable struct Node{T}
    x::T
    children::NTuple{3,Node{T}}
    isdefined::NTuple{3,Bool}

    Node{T}() where {T} = new{T}()
    Node(x::T, c::NTuple{3,Node{T}}, def::NTuple{3,Bool}) where {T} = new{T}(x, c, def)
end

How would we initialise only some of the fields in the children tuple? I don’t want to make an empty Node() – that’s too expensive. And I don’t want to use Union{Node{T},Nothing} for the reasons above. So… what can we do?

One approach I have started to do is set up “poisoned” nodes that will not add allocations, but still result in some sort of downstream error. The simplest is to create a cycle - which will hopefully cause a stack overflow for improper accesses in a downstream code:

node = Node{Float64}()
node.x = 0.5
node.isdefined = (true, false, false)

# What to do for undefined `node.children`? Create a cycle!
node.children = (Node{Float64}(), node, node)

Now, anybody who tries to recurse through the children without checking .isdefined, they will get a stack overflow error. The bug stays loud and we don’t add any allocations, and this is generic across T.

However, it feels hacky to do this. And a new user might be confused by the specific error.

Are there other approaches to solve this? (I just wish there was a way to use #undef within an immutable.)

5 Likes

This is possible for custom structs, just not tuples, as long as the fields are mutable. Though the limitations of new mean all the undefs have to be at the end, you can’t have, say, the second child undef while the first and third are defined.

julia> struct Children{NT}
           child1::NT
           child2::NT
           child3::NT
           Children{NT}() where {NT} = new{NT}()
       end

julia> mutable struct Node{T}
           x::T
           isdefined::NTuple{3,Bool}
           children::Children{Node{T}}
           Node{T}(x) where {T} = new{T}(x, (false, false, false), Children{Node{T}}())
       end

julia> Node{Float64}(1.0)
Node{Float64}(1.0, (false, false, false), Children{Node{Float64}}(#undef, #undef, #undef))

With this approach, you don’t really need the separate isdefined field, you can just do isdefined(node.children, :child2) et cetera

Is there any way to make the number of fields in that pseudo-tuple a type parameter? So that you dont have to write out child1, child2, child3 explicitly.

(I’m trying to make a D-degree node in a tree, for some type param D, hence the tuple. But maybe there’s a way to get #undef for tuples too?)

Also are there allocations on Children{NT}() since the fields are technically objects on the heap? (Am on phone so can’t check)

I’m sure there’s a way using a bunch of unsafe_, but I don’t know how.

As for parametrizing the number of fields, there’s always the option of writing a macro to define a slew of Children1{NT}, Children2{NT}, and so on. I acknowledge it’s not very smooth. I don’t see another way…

Only the allocations of the actual Nodes you put in the struct. Under the hood, #undef is just the null pointer, not a pointer to some freshly allocated sentinel object or something.

1 Like

Thanks!

Yes if anybody knows how to do some sort of unsafe_ internal call to set up null pointers, I’d be very in trying.

The macro sounds like an option though I think it would mean another macro for Node3{T} and so on, so definitely not ideal.

I’d generally say don’t be afraid of the small union, but as you’ve discovered, tuples of unions don’t split as you’d hope. Unions of tuples, however, will!

But with a three tuple, you’re looking at a union of eight possible tuple types… which, yeah, that’s going to be a pain. Do you actually need to support all eight heterogeneous cases?

1 Like

My solution, used in ParseUnparse.jl, was to use the same implementation you propose (an additional Bool field), but the issues you raise are addressed by making it a collection type, named Optional. An AbstractVector, to be specific.

Perhaps there should be a dedicated package for Optional, one problem is that the name is not available, although the registered packages are unused and unmaintained AFAIK, so it should be possible to repurpose one of them after a major version bump.

The code is here:

Does this design address all of your concerns?

An alternative implementation approach (used in an earlier, unreleased, version of ParseUnparse.jl) is to rely on undef initialization of mutable structs, however that’s obviously always or almost always less efficient than the above:

1 Like

It doesn’t appear to be cascading type instability. The tuple of unions type for the x field shows up as red, but the indexing narrows the type down to Union{Float64, Nothing} and @something nails that down to Float64:

julia> barkernel(b) = begin
           @something(b.x[1], 0.0) + @something(b.x[2], 0.0) + @something(b.x[3], 0.0)
       end;

julia> @code_warntype barkernel(B{Float64}(ntuple(_ -> rand(Bool) ? randn() : nothing, 3)))
...

It stood out to me that there were 1.49 allocations per element as if rand(Bool) did something, so I checked less random cases:

julia> bs = [B{Float64}((nothing, nothing, nothing)) for _ in 1:100_000]; @btime bar($bs)
  2.977 ms (1 allocation: 16 bytes)
0.0

julia> bs = [B{Float64}((1.0, nothing, nothing)) for _ in 1:100_000]; @btime bar($bs)
  3.647 ms (100001 allocations: 1.53 MiB)
100000.0

julia> bs = [B{Float64}((1.0, 2.0, nothing)) for _ in 1:100_000]; @btime bar($bs)
  4.496 ms (200001 allocations: 3.05 MiB)
300000.0

julia> bs = [B{Float64}((1.0, 2.0, 3.0)) for _ in 1:100_000]; @btime bar($bs)
  5.268 ms (300001 allocations: 4.58 MiB)
600000.0

So there’s this 1 overall 16-byte allocation (probably the sum, weirdly init=0.0 makes it 48 bytes in 3 allocations), then 1 allocation (~16 bytes) per Float64 element, suggesting that it’s happening in a branch of @something after a nothing check. Checking @code_llvm barkernel(B{Float64}(ntuple(_ -> false ? randn() : nothing, 3))), this is what @something(b.x[1], 0.0) does:

; │┌ @ tuple.jl:31 within `getindex`
    %getfield = call ptr @ijl_get_nth_field_checked(ptr nonnull %"b::B.x", i64 0)
; │└
; │ @ some.jl:159 within `macro expansion`
   %getfield.tag_addr = getelementptr inbounds i64, ptr %getfield, i64 -1
   %getfield.tag = load atomic i64, ptr %getfield.tag_addr unordered, align 8
   %0 = and i64 %getfield.tag, -16
   %1 = inttoptr i64 %0 to ptr
   %exactly_isa.not.not = icmp eq ptr %1, @"+Core.Nothing#8512.jit"
   br i1 %exactly_isa.not.not, label %L17, label %guard_pass49

L17:                                              ; preds = %guard_pass49, %top
   %value_phi4 = phi double [ 0.000000e+00, %top ], [ %getfield.unbox, %guard_pass49 ]
...

guard_pass49:                                     ; preds = %top
   %getfield.unbox = load double, ptr %getfield, align 8
   br label %L17
...

The branch only contains loading a Float64 instance from the tuple element, but I’m not sure how that allocates. The structure of B{Float64} itself already seems boxed, and I’m not sure how that’s related:

julia> sizeof(B{Float64}) # just a pointer
8

julia> sizeof(NTuple{3,Union{Float64,Nothing}}) # an abstract type with many subtypes
ERROR: Argument is an incomplete Tuple type and does not have a definite size.
...

julia> struct B3{T}
         x1::Union{T, Nothing}
         x2::Union{T, Nothing}
         x3::Union{T, Nothing}
       end

julia> sizeof(B3{Float64}) # inlined 3 pairs of field and padded type tag
48

More context – A{T} stuff was a toy example but my actual usecase is DynamicExpressions.jl Node type, shown here (in the upcoming v2):

The children field is the one of interest. It seems to be very fast with the current approach of using node.children = (new_node, node) for a node with .degree == 1 (this will create a cycle - which will purposefully cause downstream errors should the user improperly access children). But the error caused by this - a stack overflow - is obviously not clear compared to the one from #undef or nothing.

On several occasions, others have recommended that I try Union{Node{T,D},Nothing} for storing children, I think due to slightly optimistic thinking about the compiler’s ability to split unions. I have put in a lot of effort in the past to try to make this work based on these recommendations. But it always results in my code being significantly slower, and can even result in propagating type instabilities throughout SymbolicRegression.jl when the unions start combining into composite types. Which is why I am keen to avoid union types altogether now.

Though I remain very curious if there are other non-union approaches to this!

Have you tested the performance of using a Vector instead of an NTuple? You can easily initialize a Vector with undef.

You can, but the uninitialized values are indistinguishable from normal instances of isbits types like Float64. The uninitialized values for an element type of Union{T, Nothing} is nothing, so it’ll work, but I think part of the point is trying to avoid an allocation. Granted, Tuple{Union{T, Nothing}} is strictly an abstract type so such a field allocates too, and it can’t possibly have the isbits-Union optimization that struct fields and Memory elements have; SVector{3, Union{T, Nothing}} also has that problem. Not really sure how it works but the instances have different concrete types and sizes:

julia> Base.summarysize.(NTuple{3, Union{Nothing, Float64}}[
               (nothing, nothing, nothing)
               (1.0, nothing, nothing)
               (1.0, 2.0, nothing)
               (1.0, nothing, 3.0)
               (1.0, 2.0, 3.0)])
5-element Vector{Int64}:
  0
  8
 16
 16
 24

@nsajko’s Optional seems like a promising approach. I’d make it more Ref-like rather than an AbstractVector, so that unwrapping is as simple as opt[]:

struct Optional{T}
    hasvalue::Bool
    value::T
    Optional{T}() where {T} = new{T}(false)
    Optional{T}(value::T) where {T} = new{T}(true, value)
end

Optional(value::T) where {T} = Optional{T}(value)

function Base.getindex(opt::Optional{T}) where {T}
    opt.hasvalue || error("attempt to unwrap option type with no value")
    return opt.value::T
end
julia> Optional(3.0)[]
3.0

julia> Optional{Float64}()[]
ERROR: attempt to unwrap option type with no value
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] getindex(opt::Optional{Float64})
   @ Main ./REPL[4]:2
 [3] top-level scope
   @ REPL[9]:1
3 Likes

Not sure if we’re on the same page or not. I was thinking of something along the lines of the following (this is just a sketch):

julia> mutable struct Node{T}
           x::T
       end

julia> x = Vector{Node}(undef, 3)
3-element Vector{Node}:
 #undef
 #undef
 #undef

julia> x[1] = Node(42)
Node{Int64}(42)

julia> x
3-element Vector{Node}:
    Node{Int64}(42)
 #undef
 #undef

Though in that example the element type is abstract, which might be bad…

I tried using vectors for storing children but they are really slow comparatively. The tuples with sentinel nodes just seem to let the compiler go to town.

Here’s some code where I explicitly branch over the different # of children a node could have, and unpack into a tuple of children of that length (masking the sentinel):

It’s a bit heavier on the compiler, but it is over an order of magnitude faster runtime for some of the operations I need compared to running an any over a vector.

Again, though, the downside is that the sentinel values here are kind of weird. A proper #undef within this children tuple would be cleaner.

Just recently this package surfaced:

The package tries to maintain the speed of SVector with the interface of Vector for small lengths. Perhaps a SmallVector{3,Node} will get the performance you need with nice syntax.

2 Likes

Thanks, looks useful.

Actually, does anybody know the performance characteristics difference between an SVector and a Tuple? I would guess that the compiler might have fewer constraints to exploit due to difference in storage? (Or maybe they literally compile to the same thing; it’s just that SVector <: AbstractArray?)

1 Like

Checking the SmallCollections package myself:

julia> using SmallCollections

julia> @btime sum(sum,ds) setup=(ds = 
  [SmallVector{3,Float64}(rand(rand(0:3))) for _ in 1:100_000]);
  866.848 ÎĽs (0 allocations: 0 bytes)

looks good, but the docs say SmallVectors need a default value, which means saving the memory initialization for empty vector slots might not be possible (if this is the issue).

A StaticArray is just a struct wrapping a tuple and implementing the array interface. As long as types are inferred they should perform identically and often compile down to identical code. But that also means getting an SVector{N,SomeMutableType} to contain an undef would be as hard and hacky as doing it for a Tuple.

1 Like

Yes I think @nsajko’s is really nice as an alternative to Union{T,Nothing}. And I like the dereferencing syntax too.

I feel that Union{T,Nothing} should almost never be recommended due to all the downstream carnage it can cause from slight misuse. Maybe it could be worth it to have a Optional{T} in Base so people don’t need to resort to unions.

This also reminds me a bit of SumTypes.jl too (overkill for this, though).

1 Like

Ah, since we need a default I guess we are back to square one. Bummer

1 Like