What is the most performant way of bootstrapping a self-referential mutable struct?

Consider the self-referential struct below:

mutable struct T
    field::T
end

The circularity in the way T was defined prevents me from ever constructing an object of type T. I have considered two ways out:

mutable struct T1
    field::Union{Nothing, T1}
end
mutable struct T2
    field::T2
    T2() = new()
    T2(x) = new(x)
end

Both T1 and T2 can be instantiated, by written respectively T1(nothing) and T2() in the REPL. Still, I do not know which one is more performant. To analyse their performance, I tried to guess how the compiler will react to each of them.

I think T1 may fail to be performant , because it refers to the type Union{Nothing, T1}, making the compiler’s type inference more difficult. And I think T2 may fail to be performant , because when I type T2() in the REPL I get T2(#unded).

Small Unions are pretty well optimized in the compiler, should be efficient. The best way to find out is to try both and profile each. I don’t think there’s going to be much of a difference though - what are you trying to do that you’re worried about the efficiency of one over the other?

2 Likes

And I think T2 may fail to be performant , because when I type T2() in the REPL I get T2(#unded) .

No, values with undefined (#undef) fields do not suffer performance wise. I am a bit simplifying here, but for an object a of type T2, the compiler always knows that accessing a.field will yield an error (if it is undefined) or something of type T2. So, modulo the error throwing, this is as good as always knowing that a.field is of type T2. This is not the case for T1. But as @Sukera said, you can profile both and see if it is fast enough for your application.

1 Like

what are you trying to do that you’re worried about the efficiency of one over the other?

There is a DataStructure that I am trying to implement. I will implement it as a self-referential struct.

Because my struct will involve self-reference, I need a way to assess how the self reference affects it’s performance.

It appears from a short testing that self-referential structures do indeed have an inherent overhead:

julia> mutable struct FooStruct{T}
       x::T
       FooStruct(obj::T) where T = new{T}(obj)
       end

julia> mutable struct FooRec
       x::FooRec
       FooRec() = (self = new(); self.x = self)
       end

julia> const refstruct = Ref(FooStruct([]))
Base.RefValue{FooStruct{Array{Any,1}}}(FooStruct{Array{Any,1}}(Any[]))

julia> const refrecstruct = Ref(FooRec())
Base.RefValue{FooRec}(FooRec(FooRec(#= circular reference @-1 =#)))

julia> foo(s) = isdefined(s, :x)
foo (generic function with 1 method)

julia> @btime foo(refstruct[])
  0.016 ns (0 allocations: 0 bytes)
true

julia> @btime foo(refrecstruct[])
  1.556 ns (0 allocations: 0 bytes)
true

So, the compiler does recognize that FooStruct cannot possibly have an uninitialized field but cannot prove the same guarantee for FooRec.

2 Likes

You can check the resulting LLVM-level code with @code_llvm:

julia> @code_llvm foo(refstruct[])
;  @ REPL[5]:1 within `foo'
define i8 @julia_foo_169(%jl_value_t* nonnull align 8 dereferenceable(8) %0) {
top:
  ret i8 1
}

julia> @code_llvm foo(refrecstruct[])
;  @ REPL[5]:1 within `foo'
define i8 @julia_foo_178(%jl_value_t* nonnull align 8 dereferenceable(8) %0) {
top:
  %1 = bitcast %jl_value_t* %0 to %jl_value_t**
  %2 = load %jl_value_t*, %jl_value_t** %1, align 8
  %3 = icmp ne %jl_value_t* %2, null
  %4 = zext i1 %3 to i8
  ret i8 %4
}
1 Like

But it is possible for the compile to infer that an object of type FooRec cannot have an uninitialized field, because FooRec’s only internal constructor constructs objects with initialized fields.

I am not sure the compiler makes this inference, but I expect it does.

The problem is, a partially initialized object exists within constructor, and the compiler doesn’t seem to care that it never actually goes into the wild. Cf.

julia> mutable struct A
       a
       A(a) = new(a)
       end

julia> mutable struct B
       b
       B(b) = (self = new(); self.b = b; self)
       end

julia> foo(a::A) = isdefined(a, :a)
foo (generic function with 1 method)

julia> foo(b::B) = isdefined(b, :b)
foo (generic function with 2 methods)

julia> @code_llvm foo(A(1))

;  @ REPL[3]:1 within `foo'
define i8 @julia_foo_873(%jl_value_t* nonnull align 8 dereferenceable(8)) {
top:
  ret i8 1
}

julia> @code_llvm foo(B(1))

;  @ REPL[4]:1 within `foo'
define i8 @julia_foo_875(%jl_value_t* nonnull align 8 dereferenceable(8)) {
top:
  %1 = bitcast %jl_value_t* %0 to %jl_value_t**
  %2 = load %jl_value_t*, %jl_value_t** %1, align 8
  %3 = icmp ne %jl_value_t* %2, null
  %4 = zext i1 %3 to i8
  ret i8 %4
}

So, when I said “self-referential structures have an inherent overhead”, I meant that it’s because the only ways to create one, as you mentioned in the OP, are either have a type-unstable field or an explicit incomplete initialization. In terms of runtime overhead, both ways must be close - either a check if the field is null or check if the Nothing flag is set.

The next question, though, is how high is that overhead compared to the indirection due to the use of a mutable struct?

I guess that I do not have a way out of this overhead. I do not know if there is an efficient way of implementing most of the usual data structures(e.g. Linked List, Tree) without a form of self reference.

Maybe something like Union{Core.Ref{T},Nothing} would work.
Also have a look at https://github.com/JuliaCollections/DataStructures.jl/blob/master/src/list.jl

1 Like

If you’re interested in certain data structures, you can check out DataStructures.jl.

There’s also a short section in the manual about incomplete initialization.

Finally, I don’t think that the timings are real, @Vasily_Pisarev. Sub-nanosecond timings are always finicky and I think all you’re measuring is the speed of a dereference of the Refs (since the contents of Refs are constructed beforehand and even marked const). This would not be a typical usecase.

julia> @benchmark foo(FooRec())
BenchmarkTools.Trial:
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     5.506 ns (0.00% GC)
  median time:      6.406 ns (0.00% GC)
  mean time:        12.244 ns (1.15% GC)
  maximum time:     530.030 ns (90.05% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> @benchmark foo(FooStruct(x)) setup=(x=[])
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.300 ns (0.00% GC)
  median time:      1.400 ns (0.00% GC)
  mean time:        1.374 ns (0.00% GC)
  maximum time:     19.100 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Looks like the difference is that the recursive version has to be heap allocated (makes sense, since it references itself and should be alive after construction) whereas the FooStruct can be stack allocated (also makes sense, since it’s basically just a wrapper around an object and those are fast nowadays - especially if the called function just checks a single hard coded field).

In the end, I wouldn’t worry about the speed of construction of either of those approaches until you know that they’re a performance bottleneck for your application. The compiler is smart - don’t try to outsmart it before you’ve written your core code :slight_smile:

2 Likes

My comment was on the runtime performance of already-constructed structs, not on the construction speed.

When the constructor does not have incomplete initialization, the compiler is smart enough to derive that the structure will not have an undefined field and eliminate a NULL check from getproperty (the same seems to be true for unitialized isbits type fields - they are “always defined”, even if sometimes their content is just random garbage). Sub-ns timings in the examples mean exactly that: the compiler can derive that the field is defined based on the type alone, without any actual dereferencing and checks (Ref is a common trick to prevent constant folding in microbenchmarks, since const Ref implies only constant address, but not constant contents).

1 Like