Early Julia had Nullable{T}
in Base, but it was scrapped in favor of Union{Nothing,Some{T}}
in Julia 0.7. See The fate of Nullable · Issue #22682 · JuliaLang/julia · GitHub. This seems to have been an attempt to address the fact that people were using Nullable
for (data analyst’s) missing values, not just for optional values.
TBH stamping a default
value of any object doesn’t feel like a performance hurdle to die on. Aren’t there lower cherries to pick in the surrounding algorithm?
Interesting. That PR discussion mostly looks at readability and usability so I guess the performance considerations weren’t on the radar yet? Also looks like Nullable{T}
was pretty much an alias for the union type, rather than a (Bool,T) composite
Maybe a new and improved Nullable{T}
needs to come back
No, it was exactly such a composite. See julia/base/nullabletype.jl at bc2e1ab017803664b6d7019d5916533f2bcb491e · JuliaLang/julia · GitHub
struct Nullable{T}
hasvalue::Bool
value::T
Nullable{T}() where {T} = new(false)
Nullable{T}(value::T, hasvalue::Bool=true) where {T} = new(hasvalue, value)
end
However, one proposal in the discussion was to turn it into an alias for the Union type rather than removing it completely.
This library defines the atom of SymbolicRegression.jl which generates and manipulates billions of per second. So, tiny choices like this really add up. In other words, allocating a default value has a really substantial impact on performance.
I wish! Alas, the cherry tree has been picked clean long ago.
The PR I reference doesn’t actually change performance at all; it’s just trying to preserve performance while generalising the structures.
Ah, I see. I wonder why performance wasn’t discussed at the time? (I thought union splitting was only introduced later?)
I think union splitting/performant small unions was expected to be on the near-term horizon at the time, and an implicit premise for the discussion was that the union approach would be as fast or faster than Nullable
. See The fate of Nullable · Issue #22682 · JuliaLang/julia · GitHub.
It would be nice if Union{T, Nothing}
could be as fast as the Option{T}
/Nullable{T}
presented.
Weird. Is there an example where the union approach is actually faster?
My guess reading that comment is that things like Array{Union{Nothing,T}}
with non-isbits T
can actually be represented as just Array{T}
with #undef
representing nothing
, so you don’t need anything extra. That’s at least one way to make sense of Union{Nothing,T}
being more optimal than Nullable{T}
for non-isbits T
. I don’t know if this is how it’s actually done.
Also, regardless of isbits, the struct-of-arrays representation of Array{Union{NullType,T}}
is probably more optimal in many contexts than the array-of-structs you get with Array{Nullable{T}}
. For example, if T
is a hardware float or int, I guess a lot of array operations on Array{Union{Missing,T}}
can vectorize.
The thread is long, but I saw SumTypes.jl mentioned and I would suggest taking a look at Moshi.jl.
Those Nullable{T}
and option types are easy to construct and their contents easily extracted without performance impact.
I’ve seen Moshi.jl, and it does look pretty good.
Ultimately, I hope that Julia gets the capability required to make Union{T, Nothing}
/ Union{Some{T}, Nothing}
fast without needing to reach for a package — I’m not sure exactly what form this would take: compiler optimizations, builtin sum types, or something else, but I have my fingers crossed that something will end up happening.
That works already with my Optional
s, exactly because they’re AbstractVector
s. The AbstractArray
interface translates the cartesian indexing to linear indexing, as usual.
I’m not sure I see this happening anytime soon. IMO Julia might benefit more from having a dedicated nullable type rather than hoping Union{T,Nothing}
eventually becomes fast in all contexts. The original thread suggests similar hopes existed – that eventually the compiler would become powerful enough – but eight years later, it still hasn’t happened, and I’m not convinced it’s even possible without some sort of actual magic. So, perhaps reconsidering the original nullable type would be valuable. Not sure!
Combining @nsajko’s approach with a tuple, one could do:
struct Nullable{T}
isdefined::Bool
x::T
Nullable(x::T) where {T} = new{T}(true, x)
Nullable{T}() where {T} = new{T}(false)
end
@inline function Base.getindex(n::Nullable)
n.isdefined || error("undefined")
n.x
end
struct NullableNTuple{N,T}
x::NTuple{N,Nullable{T}}
NullableNTuple(x::NTuple{N,Nullable{T}}) where {N,T} = new{N,T}(x)
NullableNTuple(x::NTuple{N,T}) where {N,T} = new{N,T}(ntuple(i -> Nullable(x[i]), Val(N)))
NullableNTuple{N,T}(x::NTuple{M,T}) where {N,M,T} = new{N,T}(ntuple(i -> i <= M ? Nullable(x[i]) : Nullable{T}(), Val(N)))
end
@inline Base.getindex(nt::NullableNTuple, i) = nt.x[i][]
which would act like a tuple and (hopefully) allocate like one. And still gives you an undef error appropriately
I see this also lets you #undef
for middle values in the tuple, which is more flexible than having a dedicated field and leaving it undefined.
julia> x = NullableNTuple((1, 2, 3))
NullableNTuple{3, Int64}((Nullable{Int64}(true, 1), Nullable{Int64}(true, 2), Nullable{Int64}(true, 3)))
julia> x[2]
2
julia> y = NullableNTuple{5,Float64}((1.0, 2.0, 3.0))
NullableNTuple{5, Float64}((Nullable{Float64}(true, 1.0), Nullable{Float64}(true, 2.0), Nullable{Float64}(true, 3.0), Nullable{Float64}(false, 0.0), Nullable{Float64}(false, 0.0)))
julia> y[5]
ERROR: undefined
And I guess if T
is allocated on the heap, you don’t even need the separate isdefined
:
julia> x = NullableNTuple{1,Base.RefValue{Float64}}(())
NullableNTuple{1, Base.RefValue{Float64}}((Nullable{Base.RefValue{Float64}}(false, #undef),))
julia> x.x[1].x
ERROR: UndefRefError: access to undefined reference
Why exactly is this? Wouldn’t you need to allocate only one default value for each type?
I assume you mean whether I tried something like the following?
const SENTINEL_NODES = Dict{Any,Any}()
#= inside function =#
sentinel = get!(() -> Node{T}(), SENTINEL_NODES, Node{T})::Node{T}
node.children = (new_node, sentinel)
So the answer is: I did actually try this, but that single dictionary lookup and dynamic type assertion was interestingly enough to severely throttle performance. (Keep in mind how hyper-optimized this library is… Tiny runtime calls like this actually have a huge performance hit!)
It is much faster to just do:
node.children = (new_node, node)
which will also cause an error for improper access - just a stackoverflow one rather than an undefined reference.
I was thinking in the context of default values for SmallVector
discussed above. I don’t know how much you know about the parameter T
in Node{T}
. In the unlikely case that you know all the possible parameters, you can define something like
const void_NodeInt = Node{Int}()
SmallCollections.default(::Type{Node{Int}}) = void_NodeInt
If you don’t know T
, but you do know that Node{T}()
is valid, then I think you could say something like
@generated function SmallCollections.default(::Type{Node{T}}) where T
node = Node{T}()
:( $node )
end
This always returns the same element that was allocated only once.
Interesting. Would allocating a mutable object in the preamble of a @generated
, and then returning that allocation as if it were a constant, cause issues with world ages? It looks a bit hacky but maybe this pattern is fine?
And no I don’t know T
, it could really be any type. I wouldn’t want to constrain this to a subset of types either.
It’s not just about performance. An option type requires explicit unwrapping and thus some judgment about what to do in the no value case. Not everyone will want to code that way in a dynamic language, but it’s nice to have the option, and with good syntax support for the most common cases it doesn’t have to be that much of a hassle (you could imagine opt ?? default
meaning opt.hasvalue ? opt.value : default
like in Swift and probably many others; opt ?> println
meaning opt.hasvalue && println(opt.value)
; et cetera).