Approaches to sentinel values in immutables

danielwe · June 6, 2025, 3:04am

Early Julia had Nullable{T} in Base, but it was scrapped in favor of Union{Nothing,Some{T}} in Julia 0.7. See The fate of Nullable · Issue #22682 · JuliaLang/julia · GitHub. This seems to have been an attempt to address the fact that people were using Nullable for (data analyst’s) missing values, not just for optional values.

Dan · June 6, 2025, 3:09am

TBH stamping a default value of any object doesn’t feel like a performance hurdle to die on. Aren’t there lower cherries to pick in the surrounding algorithm?

MilesCranmer · June 6, 2025, 3:15am

Interesting. That PR discussion mostly looks at readability and usability so I guess the performance considerations weren’t on the radar yet? ~~Also looks like Nullable{T} was pretty much an alias for the union type, rather than a (Bool,T) composite~~

Maybe a new and improved Nullable{T} needs to come back

danielwe · June 6, 2025, 3:19am

No, it was exactly such a composite. See julia/base/nullabletype.jl at bc2e1ab017803664b6d7019d5916533f2bcb491e · JuliaLang/julia · GitHub

struct Nullable{T}
    hasvalue::Bool
    value::T

    Nullable{T}() where {T} = new(false)
    Nullable{T}(value::T, hasvalue::Bool=true) where {T} = new(hasvalue, value)
end

However, one proposal in the discussion was to turn it into an alias for the Union type rather than removing it completely.

MilesCranmer · June 6, 2025, 3:22am

This library defines the atom of SymbolicRegression.jl which generates and manipulates billions of per second. So, tiny choices like this really add up. In other words, allocating a default value has a really substantial impact on performance.

I wish! Alas, the cherry tree has been picked clean long ago.

The PR I reference doesn’t actually change performance at all; it’s just trying to preserve performance while generalising the structures.

MilesCranmer · June 6, 2025, 3:26am

Ah, I see. I wonder why performance wasn’t discussed at the time? (I thought union splitting was only introduced later?)

danielwe · June 6, 2025, 3:30am

I think union splitting/performant small unions was expected to be on the near-term horizon at the time, and an implicit premise for the discussion was that the union approach would be as fast or faster than Nullable. See The fate of Nullable · Issue #22682 · JuliaLang/julia · GitHub.

tecosaur · June 6, 2025, 3:44am

It would be nice if Union{T, Nothing} could be as fast as the Option{T}/Nullable{T} presented.

MilesCranmer · June 6, 2025, 3:57am

Weird. Is there an example where the union approach is actually faster?

danielwe · June 6, 2025, 4:16am

My guess reading that comment is that things like Array{Union{Nothing,T}} with non-isbits T can actually be represented as just Array{T} with #undef representing nothing, so you don’t need anything extra. That’s at least one way to make sense of Union{Nothing,T} being more optimal than Nullable{T} for non-isbits T. I don’t know if this is how it’s actually done.

Also, regardless of isbits, the struct-of-arrays representation of Array{Union{NullType,T}} is probably more optimal in many contexts than the array-of-structs you get with Array{Nullable{T}}. For example, if T is a hardware float or int, I guess a lot of array operations on Array{Union{Missing,T}} can vectorize.

nandoconde · June 6, 2025, 6:24am

The thread is long, but I saw SumTypes.jl mentioned and I would suggest taking a look at Moshi.jl.

Those Nullable{T} and option types are easy to construct and their contents easily extracted without performance impact.

tecosaur · June 6, 2025, 7:30am

I’ve seen Moshi.jl, and it does look pretty good.

Ultimately, I hope that Julia gets the capability required to make Union{T, Nothing} / Union{Some{T}, Nothing} fast without needing to reach for a package — I’m not sure exactly what form this would take: compiler optimizations, builtin sum types, or something else, but I have my fingers crossed that something will end up happening.

nsajko · June 6, 2025, 8:58am

That works already with my Optionals, exactly because they’re AbstractVectors. The AbstractArray interface translates the cartesian indexing to linear indexing, as usual.

MilesCranmer · June 6, 2025, 9:18am

I’m not sure I see this happening anytime soon. IMO Julia might benefit more from having a dedicated nullable type rather than hoping Union{T,Nothing} eventually becomes fast in all contexts. The original thread suggests similar hopes existed – that eventually the compiler would become powerful enough – but eight years later, it still hasn’t happened, and I’m not convinced it’s even possible without some sort of actual magic. So, perhaps reconsidering the original nullable type would be valuable. Not sure!

MilesCranmer · June 6, 2025, 10:57am

Combining @nsajko’s approach with a tuple, one could do:

struct Nullable{T}
    isdefined::Bool
    x::T

    Nullable(x::T) where {T} = new{T}(true, x)
    Nullable{T}() where {T} = new{T}(false)
end
@inline function Base.getindex(n::Nullable)
    n.isdefined || error("undefined")
    n.x
end

struct NullableNTuple{N,T}
    x::NTuple{N,Nullable{T}}

    NullableNTuple(x::NTuple{N,Nullable{T}}) where {N,T} = new{N,T}(x)
    NullableNTuple(x::NTuple{N,T}) where {N,T} = new{N,T}(ntuple(i -> Nullable(x[i]), Val(N)))
    NullableNTuple{N,T}(x::NTuple{M,T}) where {N,M,T} = new{N,T}(ntuple(i -> i <= M ? Nullable(x[i]) : Nullable{T}(), Val(N)))
end
@inline Base.getindex(nt::NullableNTuple, i) = nt.x[i][]

which would act like a tuple and (hopefully) allocate like one. And still gives you an undef error appropriately

I see this also lets you #undef for middle values in the tuple, which is more flexible than having a dedicated field and leaving it undefined.

julia> x = NullableNTuple((1, 2, 3))
NullableNTuple{3, Int64}((Nullable{Int64}(true, 1), Nullable{Int64}(true, 2), Nullable{Int64}(true, 3)))

julia> x[2]
2

julia> y = NullableNTuple{5,Float64}((1.0, 2.0, 3.0))
NullableNTuple{5, Float64}((Nullable{Float64}(true, 1.0), Nullable{Float64}(true, 2.0), Nullable{Float64}(true, 3.0), Nullable{Float64}(false, 0.0), Nullable{Float64}(false, 0.0)))

julia> y[5]
ERROR: undefined

And I guess if T is allocated on the heap, you don’t even need the separate isdefined:

julia> x = NullableNTuple{1,Base.RefValue{Float64}}(())
NullableNTuple{1, Base.RefValue{Float64}}((Nullable{Base.RefValue{Float64}}(false, #undef),))

julia> x.x[1].x
ERROR: UndefRefError: access to undefined reference

matthias314 · June 6, 2025, 12:37pm

Why exactly is this? Wouldn’t you need to allocate only one default value for each type?

MilesCranmer · June 6, 2025, 7:49pm

I assume you mean whether I tried something like the following?

const SENTINEL_NODES = Dict{Any,Any}()

#= inside function =#
    sentinel = get!(() -> Node{T}(), SENTINEL_NODES, Node{T})::Node{T}
    node.children = (new_node, sentinel)

So the answer is: I did actually try this, but that single dictionary lookup and dynamic type assertion was interestingly enough to severely throttle performance. (Keep in mind how hyper-optimized this library is… Tiny runtime calls like this actually have a huge performance hit!)

It is much faster to just do:

    node.children = (new_node, node)

which will also cause an error for improper access - just a stackoverflow one rather than an undefined reference.

matthias314 · June 6, 2025, 8:17pm

I was thinking in the context of default values for SmallVector discussed above. I don’t know how much you know about the parameter T in Node{T}. In the unlikely case that you know all the possible parameters, you can define something like

const void_NodeInt = Node{Int}()
SmallCollections.default(::Type{Node{Int}}) = void_NodeInt

If you don’t know T, but you do know that Node{T}() is valid, then I think you could say something like

@generated function SmallCollections.default(::Type{Node{T}}) where T
    node = Node{T}()
    :( $node )
end

This always returns the same element that was allocated only once.

MilesCranmer · June 6, 2025, 8:25pm

Interesting. Would allocating a mutable object in the preamble of a @generated, and then returning that allocation as if it were a constant, cause issues with world ages? It looks a bit hacky but maybe this pattern is fine?

And no I don’t know T, it could really be any type. I wouldn’t want to constrain this to a subset of types either.

danielwe · June 6, 2025, 8:35pm

It’s not just about performance. An option type requires explicit unwrapping and thus some judgment about what to do in the no value case. Not everyone will want to code that way in a dynamic language, but it’s nice to have the option, and with good syntax support for the most common cases it doesn’t have to be that much of a hassle (you could imagine opt ?? default meaning opt.hasvalue ? opt.value : default like in Swift and probably many others; opt ?> println meaning opt.hasvalue && println(opt.value); et cetera).

Topic		Replies	Views
Default constructor for any type? General Usage question	94	1881	February 29, 2024
Immutables with reference-fields: Why boxed? Internals & Design	20	3832	December 14, 2017
Why does arrayref throw? New to Julia question	89	2950	November 18, 2023
Memory layout of structs with mutable fields Performance	42	3247	March 3, 2019
Making an immutable array type? Internals & Design	19	6152	November 16, 2019

Approaches to sentinel values in immutables

Related topics