Default constructor for any type?

Any number of ways, the point is that the optimization is available, because instances will not switch the type of a const field after instantiation, because they can’t.

You can come up with examples where that optimization wouldn’t be available, that’s generally true of optimizations. But the given example is not one of those cases.

This can clearly be specialized for both arms of the Union. That would be legal, the source code contains the necessary information to determine that it’s legal.

How this might be implemented is a detail, one which is quite irrelevant to my point.

I don’t see anywhere in the examples so far where the compiler would have access to the instance or its const field value.

I’d like to repeat my complaints that this layout is really really bad:
julia> dumplayout(::Type{T}) where T = push!([(Int(fieldoffset(T, idx)), fieldtype(T,idx), fieldname(T, idx)) for idx=1:length(fieldnames(T))], (sizeof(T), Nothing, :end) )

julia> dumplayout(Node{Int})
8-element Vector{Tuple{Int64, Type, Symbol}}:
 (0, UInt8, :degree)
 (1, Union{Nothing, Bool}, :constant)
 (8, Union{Nothing, Int64}, :val)
 (18, Union{Nothing, UInt16}, :feature)
 (21, Union{Nothing, UInt8}, :op)
 (24, Union{Nothing, Node{Int64}}, :l)
 (32, Union{Nothing, Node{Int64}}, :r)
 (40, Nothing, :end)

Look at all the gaps! Don’t use union types for :feature, :constant, :op . Instead, reserve one value (0? 0xff?) for the case where this is unset. And push your :val to after :op before :l, to avoid the gaps. (structs are always in the order of the fields you declare. Each type has a minimal alignment it needs. So struct X a::Int8; b::Int64; c::Int8 end has double the size of struct Y a::Int8; c::Int8; b::Int64 end)

Let me show you an example where const-ness helps.

mutable struct X contents::Union{Nothing, Int64} end
function foo(x::X, n)
  for i=1:n
    bar()
    bar(x.contents)
  end
end

Suppose that bar is not inlineable, complicated, and somewhat opaque to the compiler. Thanks to union-splitting, this will be transformed into

function foo(x::X, n)
  for i=1:n
    bar()
    if(x.contents isa Nothing) 
      bar(nothing)
    else 
      bar(x.contents::Int64) 
    end
  end
end

This is the best the compiler can do: It needs to emit a single branch, and then can call the correct implementation. We cannot do better because bar() might access global variables that hold a reference to x and change the type of x.contents.

Suppose you had instead mutable struct X const contents::Union{Nothing, Int64} end. Now the compiler knows that x.contents cannot change types, and the check and access can be hoisted

function foo(x::X, n)
  if x.contents isa Nothing
    for i=1:n
      bar()
      bar(nothing)
    end
  else
    tmp = x.contents::Int
    for i=1:n
      bar()
      bar(tmp)
    end
  end
end

This is good for performance.

But hoisting of checks and loads is really the only thing constness helps with.

3 Likes

The compiler should be able to improve on this. Without suggesting an implementation, consider the Union as creating two virtual types, X₁ and X₂, for each branch of the Union. Then it could generate

function foo(x::X₁, n)
    for i=1:n
      bar()
      bar(nothing)
    end
end 

# and 

function foo(x::X₂, n)
    tmp = x.contents::Int
    for i=1:n
      bar()
      bar(tmp)
    end
end

Although it would have to look up which “virtual type” applied at runtime if it can’t infer that information, just like with parametric types now.

This is ignoring all the details which a real implementation would need, like a heuristic limit on how many virtual types get generated, and how it specializes, what affect that would have on the inner workings of the type system, and so on. But the optimization is available, because it’s legal, any instance is an X₁ or an X₂ and can’t change from one category to the other.

If you want this, you write

mutable struct X{T<: Union{Nothing, Int64}}
const x::T
end

Now you have 3 types: X{Int64}, X{Nothing} and finally X{Union{Int64, Nothing}}.

By writing struct X x::Union{Nothing, Int64} end you explicitly instruct the compiler to generate one type, not 3. You cannot complain afterwards that it should have generated 3 types when you were so explicit in your intentions.

1 Like

Just to emphasize your point, this actually does already happen today now too, since some compiler annotations (eg inbounds/boundscheck) rely on the notion that the only possible way to create the object is to go through the official constructor. Having code that might emit a constructor for it elsewhere could already undermine the safety of the compiler for arbitrary bits of code and cause memory corruption or other UB accordingly

4 Likes

This would only help if we had the unlimited memory of a Turing machine and the time dilation of a time machine. Union-splitting has a very small limit, so only functions on 1, maybe 2, isolated fields like x.content can utilize it. If we had a bar(x.content, y.content2, z.content3), we very reasonably fall back to general runtime dispatch instead of compiling 8 variants of bar. If X had >1 const fields with Union types, then you’d get an unscalable combinatorial explosion if you unconditionally compiled a function foo of X over every derivative concrete-parametric type X1, X2, …, Xn where n is the product of every Union’s sizes, even when foo doesn’t need every field of X.

Sure, the compiler could reduce the derivative Xi types by only considering the relevant fields, but why do more work for foo(x::X, n) when compiling variants for bar(x.contents) already accomplishes the necessary optimization, but much better? Imagine a foo2 with 2 separate loops, one calling bar(x.content) and the other calling baz(x.content2) where the field is const content2::Union{Nothing, Int, Float64}. You only need to Union-split over 2 bar variants then Union-split over 3 baz variants, as opposed to making 6 Xi variants over which you can’t Union-split because you exceeded the limit. Worse, when you also compile foo, the X1 and X2 types are different from those of foo2, further contributing to combinatorial explosion of compilation. With limited memory and time in reality, reducing compilation is important and has been heavily improved throughout v1 Julia, and a code generation scheme that makes it much worse is unlikely to happen.

I think we should consider adding those as features for v2.0. It would be a breaking change to the laws of physics, sure, but the developer experience would be awesome.

Julia reserves the right to produce more optimal code than the user has requested.

For instance, if we have a function

f(a, b, c) = a * b + c

You might argue that, having specified the parameters as ::Any, I have no right to complain if the compiler looks up * and + at run time.

Be that as it may, if I call f with f(1,2,3), the Julia compiler, being a generous sort, will specialize the method accordingly, producing a virtual sort of f(a::Int, b::Int, c::Int). Virtual in the sense that if I later define f(a::Int, b::Int, c::Int), that definition will overshadow the compiler’s version.

Similarly, the compiler retains the right to specialize both branches of a Union, if it makes sense to do so, and leads to better code. The fact that it doesn’t do so currently is not a rule of law.

It sounds like you’re quite knowledgeable about the inner workings of compilers, could you point me toward all the clever work on them you’ve obviously done?

Personal snark is no substitute for constructive debate. I suggest you stick with the latter as you had previously, as much as I sympathize with the mild difficulties of disagreement.

I do want to note that Julia does do union splitting within a function.

julia> @code_llvm f(N(2.0))
;  @ REPL[53]:1 within `f`
define { {}*, i8 } @julia_f_396([8 x i8]* noalias nocapture noundef nonnull align 8 dereferenceable(8) %0, { i64, i8 }* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1) #0 {
top:
; ┌ @ Base.jl:37 within `getproperty`
   %2 = getelementptr inbounds { i64, i8 }, { i64, i8 }* %1, i64 0, i32 1
   %3 = load i8, i8* %2, align 8
; └
  %exactly_isa.not = icmp eq i8 %3, 0
  br i1 %exactly_isa.not, label %union_move, label %union_move2

post_union_move:                                  ; preds = %union_move2, %union_move
  %tindex_phi12 = phi i8 [ 2, %union_move2 ], [ 1, %union_move ]
  %4 = insertvalue { {}*, i8 } { {}* null, i8 undef }, i8 %tindex_phi12, 1
  ret { {}*, i8 } %4

union_move:                                       ; preds = %top
; ┌ @ intfuncs.jl:332 within `literal_pow`
; │┌ @ float.jl:411 within `*`
    %5 = bitcast { i64, i8 }* %1 to double*
    %unbox = load double, double* %5, align 8
    %6 = fmul double %unbox, %unbox
; └└
  %7 = bitcast [8 x i8]* %0 to double*
  store double %6, double* %7, align 8
  br label %post_union_move

union_move2:                                      ; preds = %top
; ┌ @ Base.jl:37 within `getproperty`
   %8 = getelementptr inbounds { i64, i8 }, { i64, i8 }* %1, i64 0, i32 0
; └
; ┌ @ intfuncs.jl:332 within `literal_pow`
; │┌ @ int.jl:88 within `*`
    %unbox4 = load i64, i64* %8, align 8
    %9 = mul i64 %unbox4, %unbox4
; └└
  %10 = bitcast [8 x i8]* %0 to i64*
  store i64 %9, i64* %10, align 8
  br label %post_union_move
}

The compiler also does constant propagation.

julia> g(x) = f(N(2one(x))) + x
g (generic function with 2 methods)

julia> g(1)
5

julia> @code_llvm g(1)
;  @ REPL[63]:1 within `g`
define i64 @julia_g_425(i64 signext %0) #0 {
top:
; ┌ @ int.jl:87 within `+`
   %1 = add i64 %0, 4
; └
  ret i64 %1
}

julia> @code_llvm g(3.5)
;  @ REPL[63]:1 within `g`
define double @julia_g_429(double %0) #0 {
top:
; ┌ @ float.jl:409 within `+`
   %1 = fadd double %0, 4.000000e+00
; └
  ret double %1
}

Method specialization for union splitting in the absence of inlining or constant expression would seem to violate Julia’s semantics though. While a static compiler would have significant freedom to perform that optimization, doing so in Julia’s dynamic context seems problematic. Julia’s multiple dispatch mechanism is intimately connected with the type definition and its parameters.

Moreover in Julia we are often considering both compile time and execution time, so a user embedding the union as a field type may be explicitly requesting not to perform that optimization.

This is utter nonsense. It is your explicit choice whether f specializes on a,b,c. If you don’t want it to specialize you write f(@nospecialize(a), @nospecialize(b), @nospecialize(c)) = ....

This is not a heuristic, this is explicitly under programmer control. It is your explicit choice whether f gets specialized. Just like it is your explicit choice whether to specialize/parametrize a type. Just like the order and alignment of fields and the resulting structure padding is not a compiler decision, it is an integral part of the julia language and an explicit programmer choice.

Likewise, the type of mutable struct X const c::Union{Nothing, Int64} end is not a compiler decision. We have a C-API. The following is guaranteed by the language and not up to compiler optimizations:

julia> mutable struct X const x::Union{Nothing,UInt64} end
julia> x=X(1);
julia> pointer_from_objref(X)
Ptr{Nothing} @0x00007a6355f11950
julia> p=pointer_from_objref(x);
julia> unsafe_load(convert(Ptr{Ptr{Nothing}}, p)-8)
Ptr{Nothing} @0x00007a6355f11950

The type of an object is the thing that is pointed at by its object header, and this type is observable and well-defined and not subject to optimization decisions. Mutable objects have an identity and a stable address, there is nothing to optimize here.

(something I would try to optimize is to allocate all DataTypes in a special arena, and only use 4 bytes for the type pointer in the object header)

I missed this when you mentioned it.

I actually have a type like this in one of my packages. The struct only exists as a reinterpret target for other VM instructions, which exposes the opcode byte so that they can be reinterpreted back to their actual structure. It’s a hack for type stability, basically.

But instantiating it isn’t valid, so I defined the only constructor to throw an error, just in case. So if I needed a default constructor for this, for some reason, I’d have to write one which uses reinterpret.

1 Like

This wouldn’t change the type of the struct, any more than a specialization changes the signature of the specialized method. That is, in fact, the essence of my point in comparing method specializations to the proposed optimization, which is of course not nonsense at all. It would be entirely legal to do. Perhaps it would make sense to have a @novariant macro to inhibit the virtual type specialization, perhaps not, it’s pointless to discuss without an implementation.

It wouldn’t violate the semantics, but it would call for significant changes to the compiler internals. The idea is that for a small number of constant Union fields within Union-splitting limits, the compiler could issue specialized (in the method sense) or virtual types for each variation. These would be visible to the compiler, but not the type system, and could be used to improve code generation when it’s possible to infer which variation applies. The method specialization process could make use of it, but not the dispatch process, as with parameterizing.

Your second paragraph is about whether this would be worth it, I’ll call that an open question :slight_smile: A wild guess says that four variants would be, eight might be, and more would not be. Four covers two constant fields Union{T,Nothing} which is going to be the most common pattern where an optimization like this could be applied, eight covers four such fields.

It would also be a lot of work on the internals. It could be interesting to explore how useful such an optimization might be, by looking for code in the wild which would benefit from that sort of optimization, running some benchmarks using a parameterization of the same struct, which would have a similar effect on generated code.

But the point I’ve made here is that this would be legal for the compiler to do, in accordance with the existing semantics of Julia’s type system. I’m happy to dispense with any and all questions of practicality: the code would be somewhat faster, the compiler somewhat slower, and there would be labor involved. Do those lines cross in the right place? No idea.

I actually resurrected this thread because I came back to it to crib your code for parameterizing Union types, so that I can access better code for a similarly-shaped problem manually. So thanks for your contribution. :slight_smile: