I think our messages crossed; is the code I sent above sort of what you mean?
Iām not sure why you would make the const fields a Union rather than a parameter.
Moreover, for the mutable fields, you just solve the type instability problem. However, in code that uses those mutable fields you still need to handle the type assertion error. At the earliest possible point, catch the error and do something to avoid crashing the entire program.
For example, notice how in Rust you are forced to handle both cases.
Do you mean a type parameter? But this is a recursive type.
I made it a union with nothing so that users canāt mistakenly access .feature in a node of the expression tree which is actually a scalar value.
Iām unclear how that affects my suggestion.
With your type definiton, we have to deal with type instability for your const fields. We may have to throw a type error.
julia> f(n) = n.constant
f (generic function with 1 method)
julia> n = Node{Int}(0,0,0,0,0, nothing, nothing)
Node{Int64}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)
julia> @code_llvm f(n)
; @ REPL[10]:1 within `f`
define i8 @julia_f_208({}* noundef nonnull align 8 dereferenceable(40) %0) #0 {
top:
; ā @ REPL[4]:3 within `getproperty`
%1 = bitcast {}* %0 to i8*
%2 = getelementptr inbounds i8, i8* %1, i64 2
%3 = load i8, i8* %2, align 1
%exactly_isa = icmp eq i8 %3, 1
br i1 %exactly_isa, label %pass, label %post_box_union
post_box_union: ; preds = %top
call void @ijl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 518642197424 to {}*), {}* inttoptr (i64 518796677128 to {}*))
unreachable
pass: ; preds = %top
%4 = getelementptr inbounds i8, i8* %1, i64 1
; ā
%unbox2 = load i8, i8* %4, align 1
ret i8 %unbox2
}
With my suggestion,
julia> abstract type AbstractExpressionNode{T} end
julia> mutable struct Node{
T,
C <: Union{Bool,Nothing},
F <: Union{UInt16,Nothing},
O <: Union{UInt8,Nothing}
} <: AbstractExpressionNode{T}
const degree::UInt8
const constant::C
val::Union{T,Nothing}
const feature::F
const op::O
l::Union{Node{T},Nothing}
r::Union{Node{T},Nothing}
function Node{T}(
degree, constant, val, feature, op,
l = nothing,
r = nothing
) where T
C = isnothing(constant) ?
Nothing : Bool
F = isnothing(feature) ?
Nothing : UInt16
O = isnothing(op) ?
Nothing : UInt8
new{T,C,F,O}(
degree, constant, val, feature, op, l, r
)
end
end
julia> n = Node{Int}(0,0,0,0,0)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)
julia> n.l = n
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, Node{Int64, Bool, UInt16, UInt8}(#= circular reference @-1 =#), nothing)
julia> n.r = Node{Int}(0, nothing, nothing, nothing, nothing)
Node{Int64, Nothing, Nothing, Nothing}(0x00, nothing, nothing, nothing, nothing, nothing, nothing)
julia> n
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, Node{Int64, Bool, UInt16, UInt8}(#= circular reference @-1 =#), Node{Int64, Nothing, Nothing, Nothing}(0x00, nothing, nothing, nothing, nothing, nothing, nothing))
The generated code can be cleaner since we can determine the actual type of the const fields from the full type of the node. Thus, we know that a type error is not possible.
julia> f(n) = n.constant
f (generic function with 2 methods)
julia> @code_llvm f(n)
; @ REPL[31]:1 within `f`
define i8 @julia_f_491({}* noundef nonnull align 8 dereferenceable(40) %0) #0 {
top:
; ā @ REPL[24]:3 within `getproperty`
%1 = bitcast {}* %0 to i8*
%2 = getelementptr inbounds i8, i8* %1, i64 1
; ā
%unbox = load i8, i8* %2, align 1
ret i8 %unbox
}
Interesting. Itās not an issue that the children l and r have unknown type parameters, and you have to do runtime dispatch on those? Thatās why I brought that up
Thatās a useful trick which Iāll definitely keep in my back pocket, but isnāt this compensating for a weakness in the current compiler? A const field declared of Union type picks a branch of the union during construction and stays that way, so surely it would be legal to automatically construct the Node type to be parametric across that Union and resolve it, the way youāre doing manually.
const fields in mutable structs are new as of 1.8 so itās forgivable that they arenāt completely integrated with the compiler yet, but this bit of sophistication shouldnāt be necessary, and I would have guessed that it isnāt. Note that it doesnāt have to effect the actual Node type parameter, it could be different compiler-specific metadata which provides the same affordance. I would find it surprising if the struct type grew parameters without an explicit declaration of them, but this doesnāt rule out a side channel.
No, a parameter distinguishing 2 types versus 1 type with a field that can be either of 2 types are semantically different and affect method compilation differently.
Donāt the typeasserts in the corresponding getproperty fix that in exchange for a runtime error? If it could somehow be guaranteed that nothing fields donāt ever get accessed, then the overhead of branching to an error should be minimal.
This is true of parameters, yes, but as I clarified below your quote, the observation isnāt actually about parameters, it is about the fact that, contra your reply, const fields of Union type cannot in fact be either of two types. Instances of Union donāt exist, and once the field takes a value instance of one arm of the union, that value, and therefore its type, cannot change.
Weāve discussed an approach like this as an alternative to default constructors:
struct A{T}
val::Union{T,Nothing}
l::Union{A{T},Nothing}
r::Union{A{T},Nothing}
end
But, for what itās worth, Aqua.jl flags this for unbound type parameters⦠(Unbound Type Parameters Ā· Aqua.jl). We would never actually trigger both fields to be nothing at the same time, but thereās no way to tell the compiler that.
Seems like thereās not really a clean solution here after all:
Expr(:new, T)or the analogous C method creates aT, but is apparently dangerous.- Moving
val::Tto the end of the struct lets you avoid needing to actually put anyTthere, but requires changing the order of your fields, and therefore isnāt extensible if you have multiple fields that you want to do this for. - Using
Union{T,Nothing}can generate type instabilities unless you manually patch in some type assertions (either at the access site or in a patchedgetproperty). This can also make Aqua complain about unbound type parameters (since you could make an object with all-nothingfields).- You can also add some type parameters to get around this, but to me it seems this might cause runtime dispatch since a node wouldnāt know all the type parameters of its children
Iāll probably just go with (2) for now.
Is it feasible to use a lookup for the constant values, just like you do for the operators? Something like this:
struct Node
degree::UInt8
constant::Bool
val::UInt32
feature::UInt16
op::UInt8
l::Base.RefValue{Node}
r::Base.RefValue{Node}
end
struct Expression{T, B, U}
values::Vector{T}
ops::OperatorEnum{B, U}
tree::Node
end
This is type stable. If a node is not a constant, then the value of the constant field will be 0. If a node is a constant, then you can look up its value with ex.values[val] (where ex is the parent Expression object).
(This example focuses on the val field. I donāt know how you want to handle the l and r fieldsāI just set them as Base.RefValue{Node} in this example.)
If youāre dealing with a whole forest of expression trees, you could probably have a single values lookup vector for all the constant values in the forest, like this:
struct Forest{T, B, U}
values::Vector{T}
ops::OperatorEnum{B, U}
trees::Vector{Node}
end
The children l and r are mutable so weāre going to have to do runtime dispatch on them anyways since they can change type. You can mitigate this with @nospecialize.
The type assertions in getproperty are now where the type instability manifest. If we compile just for Node{T} then we do not know whether the type assertions will throw or not so there is now a branch in the code paths. If we compile for Node{T, Bool, UInt16, UInt8} then we know for the constant fields that the type assertion will not throw and we can just elide out the branch that throws.
What we do need to weigh here is the cost of compilation versus the efficiencies gained. With the three constant fields, there are 8 types for each T that we have to compile for.
| n | C |
F |
O |
|---|---|---|---|
| 1 | nothing | nothing | nothing |
| 2 | nothing | nothing | UInt8 |
| 3 | nothing | UInt16 | nothing |
| 4 | nothing | UInt16 | UInt8 |
| 5 | Bool | nothing | nothing |
| 6 | Bool | nothing | UInt8 |
| 7 | Bool | UInt16 | nothing |
| 8 | Bool | UInt16 | UInt8 |
Thanks. Indeed Iām not sure how practical this is compared to simply having the field be constant but I will check to see.
The thing I was mainly concerned about with the Union{T,Nothing} approaches is whether it is amenable to static compilation. For example it doesnāt seem I can simply malloc structs made with union fieldsā¦
julia> struct MyStruct1
a::Union{Float64,Nothing}
end
julia> isbits(MyStruct1(1.0))
false
julia> struct MyStruct2
a::Float64
end
julia> isbits(MyStruct2(1.0))
true
(My assumption here is that T would be isbitstype FYI).
This is my original motivation for avoiding Union and any other type unstable code.
I see that I canāt simply work with undefined .val either in statically compiled code, as it needs to access all fields, leading me back to think about trying the jl_new_struct_uninit approach suggested by @foobar_lv2ā¦
You are mixing various āstaticā concepts.
Letās just talk about the meaning of isbits as below. All that we are asking below is whether it is immutable and can be composed of primitive types. In terms of memory allocation, we just need to understand the implications for memory layout.
julia> struct MyStruct3{A <: Union{Float64,Nothing}}
a::A
end
julia> isbits(MyStruct3(1.0))
true
julia> isbits(MyStruct3(nothing))
true
julia> isbitstype(MyStruct3{Float64})
true
julia> isbitstype(MyStruct3{Nothing})
true
julia> isbitstype(MyStruct3{Union{Float64,Nothing}})
false
julia> isbits(MyStruct3{Union{Float64,Nothing}}(1.0))
false
This has indirect implications for memory layout. Memory layout consists of two parts: 1) the data, and 2) the metadata to denote internal types. For bits types, we usually do not need any metadata because all the information we need is encoded in the overall type attached to binding. When there are abstract types or unions involved, we do need metadata to denote which types were actually used.
Below we see that the different parameters impact the number of bytes needed to store each type.
julia> sizeof(MyStruct3{Nothing})
0
julia> sizeof(MyStruct3{Float64})
8
julia> sizeof(MyStruct3{Union{Float64,Nothing}})
16
Seeing that we need at most 16 bytes to store each of the types above, letās malloc that and wrap that as an array for convenience. We could have also just used zeros below, but Iām using malloc explicitly because you mentioned it.
julia> ptr = Libc.malloc(16)
Ptr{Nothing} @0x0000000002e81bc0
julia> A = unsafe_wrap(Array, Ptr{UInt}(ptr), 2; own=false);
julia> fill!(A, 0)
2-element Vector{UInt64}:
0x0000000000000000
0x0000000000000000
First, letās look at the memory layout of MyFloat3{Float64}. From above we know this is 8-bytes, the same size of the Float64 contained within. Indeed all that we stored below is the floating point value.
julia> let S = MyStruct3{Float64}
fill!(A, 0)
unsafe_store!(Ptr{S}(ptr), S(1.0))
display(A)
end
2-element Vector{UInt64}:
0x3ff0000000000000
0x0000000000000000
julia> let S = MyStruct3{Float64}
fill!(A, 0)
unsafe_store!(Ptr{S}(ptr), S(0.5))
display(A)
end
2-element Vector{UInt64}:
0x3fe0000000000000
0x0000000000000000
The values 0x3ff and 0x3fe correspond to 1023 and 1022. These indicate the exponent of the floating point value offset by 1023. They thus correspond to 2^0 and 2^-1.
Next if we use MyStruct3{Union{Float64,Nothing}} we get the following.
julia> let S = MyStruct3{Union{Float64,Nothing}}
fill!(A, 0)
unsafe_store!(Ptr{S}(ptr), S(1.0))
display(A)
end
2-element Vector{UInt64}:
0x3ff0000000000000
0x0000000000007a01
julia> let S = MyStruct3{Union{Float64,Nothing}}
fill!(A, 0)
unsafe_store!(Ptr{S}(ptr), S(nothing))
display(A)
end
2-element Vector{UInt64}:
0x00007fa5a4e9bc08
0x0000000000000200
julia> let S = MyStruct3{Union{Float64,Nothing}}
fill!(A, 0)
unsafe_store!(Ptr{S}(ptr), S(0.25))
display(A)
end
2-element Vector{UInt64}:
0x3fd0000000000000
0x0000000000007a01
The first 8-bytes are the same as with MyStruct3{Float64} but the second set of eight bytes is now populated. The second set of eight bytes corresponds to the type.
Thus there is really no issue with using malloc with structs with Union types.
That seems like a drastic choice when there are simple options using vanilla Julia code, like the one I mentioned above:
Vanilla Julia Solution
struct Node
degree::UInt8
constant::Bool
val::UInt32
feature::UInt16
op::UInt8
l::Base.RefValue{Node}
r::Base.RefValue{Node}
end
struct Expression{T, B, U}
values::Vector{T}
ops::OperatorEnum{B, U}
tree::Node
end
Itās worth emphasizing that if you use jl_new_struct_uninit you are relying on Julia internals that could change with any new minor or patch release of Julia. So if your package uses jl_new_struct_uninit, you need to upper bound the Julia version in the compat section of your Project.toml file to the most recent Julia version on which youāve tested that your package works. Nobody in the Julia ecosystem does this, but they should. The Julia ecosystem takes a very cavalier attitude towards using internals of other packages (either Base or other third-party packages).
Thanks, @mkitti, this is extremely helpful. Just to follow-up quickly, how would you actually use that in code? And there wouldnāt be any problems with recursive types, right? This entire thread has really been my attempt to figure out how to allocate the Node struct manually so I can do stuff with it in a StaticCompiler.jl context. If I can do that, Iām happy.
mutable struct Node{T} <: AbstractExpressionNode{T}
const degree::UInt8
const constant::Union{Bool,Nothing}
val::Union{T,Nothing}
const feature::Union{UInt16,Nothing}
const op::Union{UInt8,Nothing}
l::Union{Node{T},Nothing}
r::Union{Node{T},Nothing}
end
@CameronBieganek This is unfortunately not a simple solution for me for several reasons (not to mention breaking an ecosystem of libraries using recursive methods that assume Node{T} is self-contained). This would lead to many new complexities like:
- Say you want to manipulate a single node and turn it from a constant (
1.52) into a feature index (x1). What do you do to thevaluesvector? The expression tree has potentially 1,000,000 nodes in it. Does that mean you need to go through the entire expression tree and update all the indices? Keep in mind you want to do millions of these mutations per second. - Say you want to do a crossover two expressions (removing half of one tree while doing so). You need to figure out a smart way of combining their value vectors, which might have out-of-order indexing (again, you can have massive amounts of nodes in a single tree). (Equivalently, you might think of doing an algebraic simplification)
- There is the
GraphNode{T}type which is also<:AbstractExpressionNode{T}. This node has the property of being a graph, rather than a binary tree. This means that some nodes have two parents! And therefore their constant values are shared. (Think of this like a multi-step equation, rather than a closed-form expression). This is super simple to do with self-contained nodes that can simply point at the same node, but would become impossibly difficult with this sort of global storage.
Basically it is much easier to do manipulations to expression trees that are self-contained : )
Youāre entirely missing my point. Maybe a demo:
mutable struct X const val::Union{Bool, Nothing} end
mutable struct Y{T<:Union{Bool, Nothing}} const val::T end
println(typeof(X(true)) == typeof(X(nothing))) # true
println(typeof(Y(true)) == typeof(Y(nothing))) # false
This makes it impossible to automatically transform X into Y, and consting the field doesnāt make a difference. If you want it to, youād need a breaking change to the type system.
I agree with the code branch part, Iām saying the typeassert in getproperty narrows down the inferred type by erroring if it encounters any other type:
getval(x::X) = x.val
println(Base.return_types(getval, (X,))) # Any[Union{Nothing, Bool}]
@inline function Base.getproperty(x::X, name::Symbol)
if name == :val getfield(x, :val)::Bool end
end
println(Base.return_types(getval, (X,))) # Any[Bool]
But checking whether the fieldās value has the allowed type needs getfield now.
Nothing happens to the values vector in that case. You just leave it alone. ![]()
Previously I mentioned just using one values vector for the whole forest of expression trees, something like this:
struct Forest{T, B, U}
values::Vector{T}
ops::OperatorEnum{B, U}
trees::Vector{Node}
end
What I had in mind is that each constant ID is uniqueāit only appears in one node instance. You can mutate the values vector if you want to change the constant value for that particular node. If you want to create a new node, you push! a new constant onto the values vector. In this scenario, you would never delete a constant value from the values vector (with, e.g, deleteat!). Crossovers are easyāyou donāt need to touch the values vector at all, you just swap out Nodes in your expression (and leave the val IDs the same).
Anyhow, Iām going to bow out of this conversation now. I need to spend less time on Discourse.
Iām not sure what is unclear, and Iām not sure if this works the way you think it does.
julia> n = Node{Int}(0,0,0,0,0, nothing, nothing)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)
julia> ptr = Ptr{typeof(n)}(Libc.malloc(sizeof(n)*5))
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0
julia> unsafe_store!(ptr, n)
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0
julia> unsafe_load(ptr)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)
julia> n.val = 5
5
julia> unsafe_store!(ptr, n, 2)
Ptr{Node{Int64, Bool, UInt16, UInt8}} @0x000000000698b9a0
julia> unsafe_load(ptr, 2)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 5, 0x0000, 0x00, nothing, nothing)
julia> unsafe_load(ptr, 1)
Node{Int64, Bool, UInt16, UInt8}(0x00, false, 0, 0x0000, 0x00, nothing, nothing)
This has no bearing whatsoever on the ability of the compiler to emit type stable code for instances of a Union which are constant.
To cite the post you are replying to:
const isnāt a characteristic of an instance, itās a characteristic of a variable or field. You still havenāt addressed how that would be factored into the suggested compiler optimization, whether itās a formal type transformation or not. Very specifically, how could:
mutable struct X const val::Union{Bool, Nothing} end
foo(x::X) = (x, fill(x.val, 1))
@code_warntype foo(X(false)) # ::Tuple{X, Union{Vector{Bool}, Vector{Nothing}}}
be type-stable? foo(::X) must infer val::Union{Bool, Nothing} because it must be compiled for any instance of X (copies of X(nothing), X(true), X(false)). val being a const field alone does not get around that at all because the compiler does not distinguish instances of X based on an unknown val instance.
You would need to provide the compiler with the value of val for a possible inference optimization:
const myval = false
# mutable X(myval) is not a compile-time value, but myval is
foo2() = @inline foo(X(myval))
# mutable X(staticval) is not a compile-time value, but staticval is
foo3(::Val{staticval}) where staticval = @inline foo(X(staticval))
That is possible but also very limited. For example, the compiler canāt tell which elements of xvector::Vector{X} have the more inferrable val after a few push!(xvector, X(myval))