Both T1 and T2 can be instantiated, by written respectively T1(nothing) and T2() in the REPL. Still, I do not know which one is more performant. To analyse their performance, I tried to guess how the compiler will react to each of them.
I think T1 may fail to be performant , because it refers to the type Union{Nothing, T1}, making the compiler’s type inference more difficult. And I think T2 may fail to be performant , because when I type T2() in the REPL I get T2(#unded).
Small Unions are pretty well optimized in the compiler, should be efficient. The best way to find out is to try both and profile each. I don’t think there’s going to be much of a difference though - what are you trying to do that you’re worried about the efficiency of one over the other?
And I think T2 may fail to be performant , because when I type T2() in the REPL I get T2(#unded) .
No, values with undefined (#undef) fields do not suffer performance wise. I am a bit simplifying here, but for an object a of type T2, the compiler always knows that accessing a.field will yield an error (if it is undefined) or something of type T2. So, modulo the error throwing, this is as good as always knowing that a.field is of type T2. This is not the case for T1. But as @Sukera said, you can profile both and see if it is fast enough for your application.
But it is possible for the compile to infer that an object of type FooRec cannot have an uninitialized field, because FooRec’s only internal constructor constructs objects with initialized fields.
I am not sure the compiler makes this inference, but I expect it does.
The problem is, a partially initialized object exists within constructor, and the compiler doesn’t seem to care that it never actually goes into the wild. Cf.
julia> mutable struct A
a
A(a) = new(a)
end
julia> mutable struct B
b
B(b) = (self = new(); self.b = b; self)
end
julia> foo(a::A) = isdefined(a, :a)
foo (generic function with 1 method)
julia> foo(b::B) = isdefined(b, :b)
foo (generic function with 2 methods)
julia> @code_llvm foo(A(1))
; @ REPL[3]:1 within `foo'
define i8 @julia_foo_873(%jl_value_t* nonnull align 8 dereferenceable(8)) {
top:
ret i8 1
}
julia> @code_llvm foo(B(1))
; @ REPL[4]:1 within `foo'
define i8 @julia_foo_875(%jl_value_t* nonnull align 8 dereferenceable(8)) {
top:
%1 = bitcast %jl_value_t* %0 to %jl_value_t**
%2 = load %jl_value_t*, %jl_value_t** %1, align 8
%3 = icmp ne %jl_value_t* %2, null
%4 = zext i1 %3 to i8
ret i8 %4
}
So, when I said “self-referential structures have an inherent overhead”, I meant that it’s because the only ways to create one, as you mentioned in the OP, are either have a type-unstable field or an explicit incomplete initialization. In terms of runtime overhead, both ways must be close - either a check if the field is null or check if the Nothing flag is set.
The next question, though, is how high is that overhead compared to the indirection due to the use of a mutable struct?
I guess that I do not have a way out of this overhead. I do not know if there is an efficient way of implementing most of the usual data structures(e.g. Linked List, Tree) without a form of self reference.
Finally, I don’t think that the timings are real, @Vasily_Pisarev. Sub-nanosecond timings are always finicky and I think all you’re measuring is the speed of a dereference of the Refs (since the contents of Refs are constructed beforehand and even marked const). This would not be a typical usecase.
Looks like the difference is that the recursive version has to be heap allocated (makes sense, since it references itself and should be alive after construction) whereas the FooStruct can be stack allocated (also makes sense, since it’s basically just a wrapper around an object and those are fast nowadays - especially if the called function just checks a single hard coded field).
In the end, I wouldn’t worry about the speed of construction of either of those approaches until you know that they’re a performance bottleneck for your application. The compiler is smart - don’t try to outsmart it before you’ve written your core code
My comment was on the runtime performance of already-constructed structs, not on the construction speed.
When the constructor does not have incomplete initialization, the compiler is smart enough to derive that the structure will not have an undefined field and eliminate a NULL check from getproperty (the same seems to be true for unitialized isbits type fields - they are “always defined”, even if sometimes their content is just random garbage). Sub-ns timings in the examples mean exactly that: the compiler can derive that the field is defined based on the type alone, without any actual dereferencing and checks (Ref is a common trick to prevent constant folding in microbenchmarks, since const Ref implies only constant address, but not constant contents).