Approaches to sentinel values in immutables

I’m not sure I see this is as equivalent. The Enzyme one just returns a DataType (e.g.), so is totally static and immutable. Whereas the one under consideration here is a fully instantiated and mutable live object.

These return values are not types, but enum values. But your point stands: they are immutable primitive values, not objects on the heap.

Found this surprising (and unfortunate) result. Even this tiny Optional{T} struct actually has quite a massive performance hit too when put in context of the library

Keep in mind how tiny this is though, relative to a typical application. I think it is likely great approach for most usecases. It’s just because we are in the nanosecond regime here (for copying a tree) we have to take some uglier but slightly faster approaches.

It seems the absolute fastest option here is to use cyclic references as a trick to induce user-side stack overflows upon improper access, rather than trying to create null pointers (the Optional{T} method) or Union{T,Nothing}.

If there’s a way to “compile-in” a null pointer I think that would work too. Basically a safer version of the @generated technique discussed above, if such a thing is allowed.

1 Like

What about my/your proposed approach? The one where there’s simply one additional Bool field per optional value. I’m aware that it’s a hack compared to a proper ADT solution, like Rust has, however it would avoid the allocations, which seem to be the bottleneck in your CPU profile.

That’s what I tried here. I actually even did less than the Bool and just used

struct Optional{T}
    x::T
    Optional(x::T) where {T} = new{T}(x)
    Optional{T}() where {T} = new{T}()
end

With this change:

    mutable struct Node{T,D} <: AbstractExpressionNode{T,D}
        degree::UInt8  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
        constant::Bool  # false if variable
        val::T  # If is a constant, this stores the actual value
        feature::UInt16  # (Possibly undefined) If is a variable (e.g., x in cos(x)), this stores the feature index.
        op::UInt8  # (Possibly undefined) If operator, this is the index of the operator in the degree-specific operator enum
-       children::NTuple{D,Node{T,D}}
+       children::NTuple{D,Optional{Node{T,D}}}
    end

But surprisingly, the change greatly worsened performance of copies (seen in the flamegraph above) relative to just having a child be set to point back to the parent.

It’s possible I made a silly mistake somewhere, but there didn’t appear to be anything obvious or any type instabilities. The flamegraph shows a GC warning next to all the Optional{T} calls which I guess makes sense given we are allocating quite a few of them for every node. I’m not sure why, as I thought it would be “free.” Is it something to do with an immutable pointing to a mutable still results in allocations even when left undefined?

1 Like

I’m imaging this could be the Julia compiler performing poorly due the recursive nature of your data structure?

In any case, can you show node_factory, set_children!, _ensure_optional? Or maybe just push the branch (this is for DynamicExpressions.jl, right)? I don’t promise to be able to help, but I’m curious.

1 Like

Quick check, something’s off, Optional{T} is doing more work than the identically structured Some{T} instantiation is:

julia> struct Optional{T}
           x::T
           Optional(x::T) where {T} = new{T}(x)
           Optional{T}() where {T} = new{T}()
       end

julia> using BenchmarkTools; @btime Optional($(zeros(4)))
  5.100 ns (1 allocation: 16 bytes)
Optional{Vector{Float64}}([0.0, 0.0, 0.0, 0.0])

julia> @btime Some($(zeros(4)))
  1.800 ns (0 allocations: 0 bytes)
Some([0.0, 0.0, 0.0, 0.0])

julia> @code_llvm Optional(zeros(4))
; Function Signature: (::Type{Main.Optional{T} where T})(Array{Float64, 1})
;  @ REPL[1]:3 within `Optional`
; Function Attrs: uwtable
define nonnull ptr @julia_Optional_833(ptr noundef nonnull readonly %"#ctor-self#::Type", ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
  %pgcstack = call ptr inttoptr (i64 140737208355296 to ptr)() #7
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %"new::Optional" = call noalias nonnull align 8 dereferenceable(16) ptr @ijl_gc_pool_alloc_instrumented(ptr %ptls_load, i32 800, i32 16, i64 2730838875792) #6
  %"new::Optional.tag_addr" = getelementptr inbounds i64, ptr %"new::Optional", i64 -1
  store atomic i64 2730838875792, ptr %"new::Optional.tag_addr" unordered, align 8
  store ptr null, ptr %"new::Optional", align 8
  store atomic ptr %"x::Array", ptr %"new::Optional" release, align 8
  ret ptr %"new::Optional"
}

And that extra work vanishes if we exclude the undef constructor:

julia> struct Optional{T}
           x::T
           Optional(x::T) where {T} = new{T}(x)
           #Optional{T}() where {T} = new{T}()
       end

julia> using BenchmarkTools; @btime Optional($(zeros(4)))
  1.800 ns (0 allocations: 0 bytes)
Optional{Vector{Float64}}([0.0, 0.0, 0.0, 0.0])

julia> @code_llvm Optional(zeros(4)) # same as `Some` call
; Function Signature: (::Type{Main.Optional{T} where T})(Array{Float64, 1})
;  @ REPL[1]:3 within `Optional`
; Function Attrs: uwtable
define [1 x ptr] @julia_Optional_2817(ptr noundef nonnull readonly %"#ctor-self#::Type", ptr noundef nonnull align 8 dereferenceable(24) %"x::Array") #0 {
top:
  %0 = insertvalue [1 x ptr] zeroinitializer, ptr %"x::Array", 0
  ret [1 x ptr] %0
}

No earthly idea how a seemingly unrelated method would affect another.

2 Likes

Oh, I’ve seen this before. When there exists a constructor leaving a field undef, the struct can no longer be allocated inline even though it’s immutable.

julia> struct Optional{T}
           x::T
           Optional(x::T) where {T} = new{T}(x)
           Optional{T}() where {T} = new{T}()
       end

julia> Base.allocatedinline(Optional{Any})
false

julia> Base.allocatedinline(Some{Any})
true

I don’t think I ever understood why, but that’s how it is. So the performance you get is as if Optional were defined as mutable struct.

6 Likes

Is there an issue for that, kinda feels like there should.

2 Likes

Thanks. Very odd. Correct me if I’m wrong but this seems to suggest having undef constructors for an immutable makes the whole struct sit on the heap?

With the following it seems to be fast again:

struct Nullable{T}
    null::Bool
    x::T
end
@inline function Base.getindex(x::Nullable)
    x.null && throw(UndefRefError())
    return x.x
end

So, you still have to figure out something to put in x, though it would have a guard against access. So in a recursive struct this isn’t so bad as you can just use “self” as x.

Here’s the diff applying it: Permit n-argument operators by MilesCranmer · Pull Request #127 · SymbolicML/DynamicExpressions.jl · GitHub

So we still use “self” to initialise the child as before, but we also tag it with the Bool for the more helpful errors

julia> x1 = Node{Float64,2}(feature=1)
x1

julia> expr = Node{Float64,2}(op=1, children=(x1,))
unary_operator[1](x1)

julia> get_child(expr, 2)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex
   @ ~/PermaDocuments/SymbolicRegressionMonorepo/DynamicExpressions.jl/src/Utils.jl:66 [inlined]
#=...=#

julia> dump(expr)
Node{Float64, 2}
  degree: UInt8 0x01
  constant: Bool false
  val: Float64 2.234256312e-314
  feature: UInt16 0x0000
  op: UInt8 0x01
  children: Tuple{DynamicExpressions.UtilsModule.Nullable{Node{Float64, 2}}, DynamicExpressions.UtilsModule.Nullable{Node{Float64, 2}}}
    1: DynamicExpressions.UtilsModule.Nullable{Node{Float64, 2}}
      null: Bool false
      x: Node{Float64, 2}
        degree: UInt8 0x00
        constant: Bool false
        val: Float64 2.2195848863e-314
        feature: UInt16 0x0001
        op: UInt8 0x82
        children: #undef
    2: DynamicExpressions.UtilsModule.Nullable{Node{Float64, 2}}
      null: Bool true
      x: Node{Float64, 2}#= circular reference @-3 =#

Edit: still losing a tiny bit of a performance relative to before

master cdc980d3c9aa3f… master / cdc980d3c9aa3f…
utils/copy/break_sharing 28.3 ± 1.8 μs 30.8 ± 7.2 μs 0.918 ± 0.22
utils/copy/preserve_sharing 0.102 ± 0.0084 ms 0.108 ± 0.012 ms 0.944 ± 0.13

presumably due to allocating extra Bools.

So other alternatives are still very much welcome. Ideally one wouldn’t have to choose between ergonomics and efficiency.

1 Like

Seems so from the @code_llvm and when I checked the linked issue’s A type being thrown into RefValue. This reminded me that before v1.5, any mutable fields forced immutable wrappers to be heap-allocated because multiple GC roots couldn’t be tracked for a field then (which makes me wonder, is there a finite limit to the number of tracked roots now?). The PR mentions that uninitializable fields (evidently even if they would always be isbits) disqualify inlining for tracking, and the alternative would be to add an extra byte for tracking (again, would that be a finite limit?). Obviously I don’t really know how that works, but at least it affirms our intuition that storing a null pointer inline wasn’t the hurdle.

layout optimization internal changes (support pointers inlining/unboxing into parents/codegen) [disabled] by vtjnash · Pull Request #33886 · JuliaLang/julia · GitHub

It’s probably considered an edge case because of the very different behavior of uninitialized fields between isbits and non-isbits types, an awkward distinction to rely on, and because new can only uninitialize trailing fields. Optional only lucked out at having only 1 field to not initialize.

Allow new to accept keyword arguments for more control over incomplete initialization · Issue #36789 · JuliaLang/julia

An exception that wasn’t mentioned seems to be isbits types still being inlinable even if uninitializable:

julia> struct UndefComplex{T} # 32 bytes distinguishable from pointer's 8
         c::Complex{T}
         d::Complex{T}
         UndefComplex(c::Complex{T}, d::Complex{T}) where T = new{T}(c, d)
         UndefComplex{T}() where T = new{T}() # uninitialized
       end

julia> Base.allocatedinline(UndefComplex{Int}) # isbits
true

julia> Base.allocatedinline(UndefComplex{BigInt}) # not isbits
false

julia> sizeof(Ref(UndefComplex(big(1)*im, big(0)*im))) # just a pointer
8

julia> struct DoubleComplex{T}
         c::Complex{T}
         d::Complex{T}
         DoubleComplex(c::Complex{T}, d::Complex{T}) where T = new{T}(c, d)
       end

julia> Base.allocatedinline(DoubleComplex{BigInt}) # not isbits, but fully initialized
true

julia> sizeof(Ref(DoubleComplex(big(1)*im, big(0)*im)))
32
2 Likes