Default constructor for any type?

I think our messages crossed; is the code I sent above sort of what you mean?

I’m not sure why you would make the const fields a Union rather than a parameter.

Moreover, for the mutable fields, you just solve the type instability problem. However, in code that uses those mutable fields you still need to handle the type assertion error. At the earliest possible point, catch the error and do something to avoid crashing the entire program.

For example, notice how in Rust you are forced to handle both cases.

Do you mean a type parameter? But this is a recursive type.

I made it a union with nothing so that users can’t mistakenly access .feature in a node of the expression tree which is actually a scalar value.

I’m unclear how that affects my suggestion.

With your type definiton, we have to deal with type instability for your const fields. We may have to throw a type error.

julia> f(n) = n.constant
f (generic function with 1 method)

julia> n = Node{Int}(0,0,0,0,0, nothing, nothing)
Node{Int64}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)

julia> @code_llvm f(n)
;  @ REPL[10]:1 within `f`
define i8 @julia_f_208({}* noundef nonnull align 8 dereferenceable(40) %0) #0 {
top:
; ┌ @ REPL[4]:3 within `getproperty`
   %1 = bitcast {}* %0 to i8*
   %2 = getelementptr inbounds i8, i8* %1, i64 2
   %3 = load i8, i8* %2, align 1
   %exactly_isa = icmp eq i8 %3, 1
   br i1 %exactly_isa, label %pass, label %post_box_union

post_box_union:                                   ; preds = %top
   call void @ijl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 518642197424 to {}*), {}* inttoptr (i64 518796677128 to {}*))
   unreachable

pass:                                             ; preds = %top
   %4 = getelementptr inbounds i8, i8* %1, i64 1
; └
  %unbox2 = load i8, i8* %4, align 1
  ret i8 %unbox2
}

With my suggestion,

julia> abstract type AbstractExpressionNode{T} end

julia> mutable struct Node{
           T,
           C <: Union{Bool,Nothing},
           F <: Union{UInt16,Nothing},
           O <: Union{UInt8,Nothing}
       } <: AbstractExpressionNode{T}
           const degree::UInt8
           const constant::C
           val::Union{T,Nothing}
           const feature::F
           const op::O
           l::Union{Node{T},Nothing}
           r::Union{Node{T},Nothing}
           function Node{T}(
               degree, constant, val, feature, op,
               l = nothing,
               r = nothing
           ) where T
               C = isnothing(constant) ?
                   Nothing : Bool
               F = isnothing(feature) ?
                   Nothing : UInt16
               O = isnothing(op) ?
                   Nothing : UInt8
               new{T,C,F,O}(
                   degree, constant, val, feature, op, l, r
               )
           end
       end

julia> n = Node{Int}(0,0,0,0,0)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)

julia> n.l = n
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, Node{Int64, Bool, UInt16, UInt8}(#= circular reference @-1 =#), nothing)

julia> n.r = Node{Int}(0, nothing, nothing, nothing, nothing)
Node{Int64, Nothing, Nothing, Nothing}(0x00, nothing, nothing, nothing, nothing, nothing, nothing)

julia> n
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, Node{Int64, Bool, UInt16, UInt8}(#= circular reference @-1 =#), Node{Int64, Nothing, Nothing, Nothing}(0x00, nothing, nothing, nothing, nothing, nothing, nothing))

The generated code can be cleaner since we can determine the actual type of the const fields from the full type of the node. Thus, we know that a type error is not possible.

julia> f(n) = n.constant
f (generic function with 2 methods)

julia> @code_llvm f(n)
;  @ REPL[31]:1 within `f`
define i8 @julia_f_491({}* noundef nonnull align 8 dereferenceable(40) %0) #0 {
top:
; ┌ @ REPL[24]:3 within `getproperty`
   %1 = bitcast {}* %0 to i8*
   %2 = getelementptr inbounds i8, i8* %1, i64 1
; └
  %unbox = load i8, i8* %2, align 1
  ret i8 %unbox
}

Interesting. It’s not an issue that the children l and r have unknown type parameters, and you have to do runtime dispatch on those? That’s why I brought that up

That’s a useful trick which I’ll definitely keep in my back pocket, but isn’t this compensating for a weakness in the current compiler? A const field declared of Union type picks a branch of the union during construction and stays that way, so surely it would be legal to automatically construct the Node type to be parametric across that Union and resolve it, the way you’re doing manually.

const fields in mutable structs are new as of 1.8 so it’s forgivable that they aren’t completely integrated with the compiler yet, but this bit of sophistication shouldn’t be necessary, and I would have guessed that it isn’t. Note that it doesn’t have to effect the actual Node type parameter, it could be different compiler-specific metadata which provides the same affordance. I would find it surprising if the struct type grew parameters without an explicit declaration of them, but this doesn’t rule out a side channel.

No, a parameter distinguishing 2 types versus 1 type with a field that can be either of 2 types are semantically different and affect method compilation differently.

Don’t the typeasserts in the corresponding getproperty fix that in exchange for a runtime error? If it could somehow be guaranteed that nothing fields don’t ever get accessed, then the overhead of branching to an error should be minimal.

1 Like

This is true of parameters, yes, but as I clarified below your quote, the observation isn’t actually about parameters, it is about the fact that, contra your reply, const fields of Union type cannot in fact be either of two types. Instances of Union don’t exist, and once the field takes a value instance of one arm of the union, that value, and therefore its type, cannot change.

We’ve discussed an approach like this as an alternative to default constructors:

struct A{T}
    val::Union{T,Nothing}
    l::Union{A{T},Nothing}
    r::Union{A{T},Nothing}
end

But, for what it’s worth, Aqua.jl flags this for unbound type parameters… (Unbound Type Parameters · Aqua.jl). We would never actually trigger both fields to be nothing at the same time, but there’s no way to tell the compiler that.


Seems like there’s not really a clean solution here after all:

  1. Expr(:new, T) or the analogous C method creates a T, but is apparently dangerous.
  2. Moving val::T to the end of the struct lets you avoid needing to actually put any T there, but requires changing the order of your fields, and therefore isn’t extensible if you have multiple fields that you want to do this for.
  3. Using Union{T,Nothing} can generate type instabilities unless you manually patch in some type assertions (either at the access site or in a patched getproperty). This can also make Aqua complain about unbound type parameters (since you could make an object with all-nothing fields).
    • You can also add some type parameters to get around this, but to me it seems this might cause runtime dispatch since a node wouldn’t know all the type parameters of its children

I’ll probably just go with (2) for now.

Is it feasible to use a lookup for the constant values, just like you do for the operators? Something like this:

struct Node
    degree::UInt8
    constant::Bool
    val::UInt32
    feature::UInt16
    op::UInt8
    l::Base.RefValue{Node}
    r::Base.RefValue{Node}
end

struct Expression{T, B, U}
    values::Vector{T}
    ops::OperatorEnum{B, U}
    tree::Node
end

This is type stable. If a node is not a constant, then the value of the constant field will be 0. If a node is a constant, then you can look up its value with ex.values[val] (where ex is the parent Expression object).

(This example focuses on the val field. I don’t know how you want to handle the l and r fields—I just set them as Base.RefValue{Node} in this example.)

If you’re dealing with a whole forest of expression trees, you could probably have a single values lookup vector for all the constant values in the forest, like this:

struct Forest{T, B, U}
    values::Vector{T}
    ops::OperatorEnum{B, U}
    trees::Vector{Node}
end

The children l and r are mutable so we’re going to have to do runtime dispatch on them anyways since they can change type. You can mitigate this with @nospecialize.

The type assertions in getproperty are now where the type instability manifest. If we compile just for Node{T} then we do not know whether the type assertions will throw or not so there is now a branch in the code paths. If we compile for Node{T, Bool, UInt16, UInt8} then we know for the constant fields that the type assertion will not throw and we can just elide out the branch that throws.

What we do need to weigh here is the cost of compilation versus the efficiencies gained. With the three constant fields, there are 8 types for each T that we have to compile for.

n C F O
1 nothing nothing nothing
2 nothing nothing UInt8
3 nothing UInt16 nothing
4 nothing UInt16 UInt8
5 Bool nothing nothing
6 Bool nothing UInt8
7 Bool UInt16 nothing
8 Bool UInt16 UInt8

Thanks. Indeed I’m not sure how practical this is compared to simply having the field be constant but I will check to see.

The thing I was mainly concerned about with the Union{T,Nothing} approaches is whether it is amenable to static compilation. For example it doesn’t seem I can simply malloc structs made with union fields…

julia> struct MyStruct1
           a::Union{Float64,Nothing}
       end

julia> isbits(MyStruct1(1.0))
false

julia> struct MyStruct2
           a::Float64
       end

julia> isbits(MyStruct2(1.0))
true

(My assumption here is that T would be isbitstype FYI).

This is my original motivation for avoiding Union and any other type unstable code.

I see that I can’t simply work with undefined .val either in statically compiled code, as it needs to access all fields, leading me back to think about trying the jl_new_struct_uninit approach suggested by @foobar_lv2…

You are mixing various “static” concepts.

Let’s just talk about the meaning of isbits as below. All that we are asking below is whether it is immutable and can be composed of primitive types. In terms of memory allocation, we just need to understand the implications for memory layout.

julia> struct MyStruct3{A <: Union{Float64,Nothing}}
           a::A
       end

julia> isbits(MyStruct3(1.0))
true

julia> isbits(MyStruct3(nothing))
true

julia> isbitstype(MyStruct3{Float64})
true

julia> isbitstype(MyStruct3{Nothing})
true

julia> isbitstype(MyStruct3{Union{Float64,Nothing}})
false

julia> isbits(MyStruct3{Union{Float64,Nothing}}(1.0))
false

This has indirect implications for memory layout. Memory layout consists of two parts: 1) the data, and 2) the metadata to denote internal types. For bits types, we usually do not need any metadata because all the information we need is encoded in the overall type attached to binding. When there are abstract types or unions involved, we do need metadata to denote which types were actually used.

Below we see that the different parameters impact the number of bytes needed to store each type.

julia> sizeof(MyStruct3{Nothing})
0

julia> sizeof(MyStruct3{Float64})
8

julia> sizeof(MyStruct3{Union{Float64,Nothing}})
16

Seeing that we need at most 16 bytes to store each of the types above, let’s malloc that and wrap that as an array for convenience. We could have also just used zeros below, but I’m using malloc explicitly because you mentioned it.

julia> ptr = Libc.malloc(16)
Ptr{Nothing} @0x0000000002e81bc0

julia> A = unsafe_wrap(Array, Ptr{UInt}(ptr), 2; own=false);

julia> fill!(A, 0)
2-element Vector{UInt64}:
 0x0000000000000000
 0x0000000000000000

First, let’s look at the memory layout of MyFloat3{Float64}. From above we know this is 8-bytes, the same size of the Float64 contained within. Indeed all that we stored below is the floating point value.

julia> let S = MyStruct3{Float64}
           fill!(A, 0)
           unsafe_store!(Ptr{S}(ptr), S(1.0))
           display(A)
       end
2-element Vector{UInt64}:
 0x3ff0000000000000
 0x0000000000000000

julia> let S = MyStruct3{Float64}
           fill!(A, 0)
           unsafe_store!(Ptr{S}(ptr), S(0.5))
           display(A)
       end
2-element Vector{UInt64}:
 0x3fe0000000000000
 0x0000000000000000

The values 0x3ff and 0x3fe correspond to 1023 and 1022. These indicate the exponent of the floating point value offset by 1023. They thus correspond to 2^0 and 2^-1.

Next if we use MyStruct3{Union{Float64,Nothing}} we get the following.

julia> let S = MyStruct3{Union{Float64,Nothing}}
           fill!(A, 0)
           unsafe_store!(Ptr{S}(ptr), S(1.0))
           display(A)
       end
2-element Vector{UInt64}:
 0x3ff0000000000000
 0x0000000000007a01

julia> let S = MyStruct3{Union{Float64,Nothing}}
           fill!(A, 0)
           unsafe_store!(Ptr{S}(ptr), S(nothing))
           display(A)
       end
2-element Vector{UInt64}:
 0x00007fa5a4e9bc08
 0x0000000000000200

julia> let S = MyStruct3{Union{Float64,Nothing}}
           fill!(A, 0)
           unsafe_store!(Ptr{S}(ptr), S(0.25))
           display(A)
       end
2-element Vector{UInt64}:
 0x3fd0000000000000
 0x0000000000007a01

The first 8-bytes are the same as with MyStruct3{Float64} but the second set of eight bytes is now populated. The second set of eight bytes corresponds to the type.

Thus there is really no issue with using malloc with structs with Union types.

3 Likes

That seems like a drastic choice when there are simple options using vanilla Julia code, like the one I mentioned above:

Vanilla Julia Solution
struct Node
    degree::UInt8
    constant::Bool
    val::UInt32
    feature::UInt16
    op::UInt8
    l::Base.RefValue{Node}
    r::Base.RefValue{Node}
end

struct Expression{T, B, U}
    values::Vector{T}
    ops::OperatorEnum{B, U}
    tree::Node
end

It’s worth emphasizing that if you use jl_new_struct_uninit you are relying on Julia internals that could change with any new minor or patch release of Julia. So if your package uses jl_new_struct_uninit, you need to upper bound the Julia version in the compat section of your Project.toml file to the most recent Julia version on which you’ve tested that your package works. Nobody in the Julia ecosystem does this, but they should. The Julia ecosystem takes a very cavalier attitude towards using internals of other packages (either Base or other third-party packages).

Thanks, @mkitti, this is extremely helpful. Just to follow-up quickly, how would you actually use that in code? And there wouldn’t be any problems with recursive types, right? This entire thread has really been my attempt to figure out how to allocate the Node struct manually so I can do stuff with it in a StaticCompiler.jl context. If I can do that, I’m happy.

mutable struct Node{T} <: AbstractExpressionNode{T}
    const degree::UInt8
    const constant::Union{Bool,Nothing}
    val::Union{T,Nothing}
    const feature::Union{UInt16,Nothing}
    const op::Union{UInt8,Nothing}
    l::Union{Node{T},Nothing}
    r::Union{Node{T},Nothing}
end

@CameronBieganek This is unfortunately not a simple solution for me for several reasons (not to mention breaking an ecosystem of libraries using recursive methods that assume Node{T} is self-contained). This would lead to many new complexities like:

  1. Say you want to manipulate a single node and turn it from a constant (1.52) into a feature index (x1). What do you do to the values vector? The expression tree has potentially 1,000,000 nodes in it. Does that mean you need to go through the entire expression tree and update all the indices? Keep in mind you want to do millions of these mutations per second.
  2. Say you want to do a crossover two expressions (removing half of one tree while doing so). You need to figure out a smart way of combining their value vectors, which might have out-of-order indexing (again, you can have massive amounts of nodes in a single tree). (Equivalently, you might think of doing an algebraic simplification)
  3. There is the GraphNode{T} type which is also <:AbstractExpressionNode{T}. This node has the property of being a graph, rather than a binary tree. This means that some nodes have two parents! And therefore their constant values are shared. (Think of this like a multi-step equation, rather than a closed-form expression). This is super simple to do with self-contained nodes that can simply point at the same node, but would become impossibly difficult with this sort of global storage.

Basically it is much easier to do manipulations to expression trees that are self-contained : )

You’re entirely missing my point. Maybe a demo:

mutable struct X   const val::Union{Bool, Nothing}   end
mutable struct Y{T<:Union{Bool, Nothing}}   const val::T   end

println(typeof(X(true)) == typeof(X(nothing))) # true
println(typeof(Y(true)) == typeof(Y(nothing))) # false

This makes it impossible to automatically transform X into Y, and consting the field doesn’t make a difference. If you want it to, you’d need a breaking change to the type system.

I agree with the code branch part, I’m saying the typeassert in getproperty narrows down the inferred type by erroring if it encounters any other type:

getval(x::X) = x.val

println(Base.return_types(getval, (X,))) # Any[Union{Nothing, Bool}]

@inline function Base.getproperty(x::X, name::Symbol)
    if name == :val  getfield(x, :val)::Bool  end
end

println(Base.return_types(getval, (X,))) # Any[Bool]

But checking whether the field’s value has the allowed type needs getfield now.

Nothing happens to the values vector in that case. You just leave it alone. :slight_smile:

Previously I mentioned just using one values vector for the whole forest of expression trees, something like this:

struct Forest{T, B, U}
    values::Vector{T}
    ops::OperatorEnum{B, U}
    trees::Vector{Node}
end

What I had in mind is that each constant ID is unique—it only appears in one node instance. You can mutate the values vector if you want to change the constant value for that particular node. If you want to create a new node, you push! a new constant onto the values vector. In this scenario, you would never delete a constant value from the values vector (with, e.g, deleteat!). Crossovers are easy—you don’t need to touch the values vector at all, you just swap out Nodes in your expression (and leave the val IDs the same).

Anyhow, I’m going to bow out of this conversation now. I need to spend less time on Discourse.

1 Like

I’m not sure what is unclear, and I’m not sure if this works the way you think it does.

julia> n = Node{Int}(0,0,0,0,0, nothing, nothing)                 
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)

julia> ptr = Ptr{typeof(n)}(Libc.malloc(sizeof(n)*5))
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0

julia> unsafe_store!(ptr, n)
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0

julia> unsafe_load(ptr)                                           
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)

julia> n.val = 5
5

julia> unsafe_store!(ptr, n, 2)
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0

julia> unsafe_load(ptr, 2)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 5, 0x0000, 0x00, nothing, nothing)

julia> unsafe_load(ptr, 1)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)

This has no bearing whatsoever on the ability of the compiler to emit type stable code for instances of a Union which are constant.

To cite the post you are replying to:

const isn’t a characteristic of an instance, it’s a characteristic of a variable or field. You still haven’t addressed how that would be factored into the suggested compiler optimization, whether it’s a formal type transformation or not. Very specifically, how could:

mutable struct X   const val::Union{Bool, Nothing}   end
foo(x::X) = (x, fill(x.val, 1))
@code_warntype foo(X(false)) # ::Tuple{X, Union{Vector{Bool}, Vector{Nothing}}}

be type-stable? foo(::X) must infer val::Union{Bool, Nothing} because it must be compiled for any instance of X (copies of X(nothing), X(true), X(false)). val being a const field alone does not get around that at all because the compiler does not distinguish instances of X based on an unknown val instance.

You would need to provide the compiler with the value of val for a possible inference optimization:

const myval = false
# mutable X(myval) is not a compile-time value, but myval is
foo2() = @inline foo(X(myval))
# mutable X(staticval) is not a compile-time value, but staticval is
foo3(::Val{staticval}) where staticval = @inline foo(X(staticval))

That is possible but also very limited. For example, the compiler can’t tell which elements of xvector::Vector{X} have the more inferrable val after a few push!(xvector, X(myval))

1 Like