Array construction of composite types failed to provide a consistent bound for its corresponding element type

MWV:

julia> struct myT{T1, T2<:Tuple{Number, Any}}
       a::T1
       b::T2
       end

julia> m1 = myT(1.0, (1, :a))
myT{Float64, Tuple{Int64, Symbol}}(1.0, (1, :a))

julia> m2 = myT(1.1, (1.1, :b))
myT{Float64, Tuple{Float64, Symbol}}(1.1, (1.1, :b))

julia> v1 = [m1, m2]
2-element Vector{myT{Float64}}:
 myT{Float64, Tuple{Int64, Symbol}}(1.0, (1, :a))
 myT{Float64, Tuple{Float64, Symbol}}(1.1, (1.1, :b))

julia> v2 = myT{Float64, <:Tuple{Real, Symbol}}[v1...]
2-element Vector{myT{Float64, <:Tuple{Real, Symbol}}}:
 myT{Float64, Tuple{Int64, Symbol}}(1.0, (1, :a))
 myT{Float64, Tuple{Float64, Symbol}}(1.1, (1.1, :b))

julia> eltype(v2) <: eltype(v1)
true

julia> eltype(v1) <: eltype(v2)
false

I would like to get the array container like v2 without a manual type assertion at the construction. Can this be easily done with some helper function? Thanks!

I don’t know how to do that (probably a proper answer depends on the exact variability of the types you want to support). But, anyway, the most narrow container would be parameterized by the Union of the two types in this case. For performance that probably makes a difference when using the vectors in this specific example.

1 Like

As types are first-class values, you could just define

julia> narrow(v) = Vector{Union{unique(typeof.(v))...}}(v)
narrow (generic function with 1 method)

julia> [m1, m2] |> narrow
2-element Vector{Union{myT{Float64, Tuple{Float64, Symbol}}, myT{Float64, Tuple{Int64, Symbol}}}}:

Not sure, when this would be worth it though.

“The ‘most narrow’ element type” is probably not the most accurate description of my desire. What I want, more specifically, is a consistent type inference across different levels of the composite type myT.

For instance, if I construct an array of the .a fields from v1, I get

julia> v1_a = getfield.(v1, :a)
2-element Vector{Float64}:
 1.0
 1.1

and the element type of v1_a is consistent with (the upper bound) of the first type parameter of eltype(v1):

julia> eltype(v1_a) == eltype(v1).body.parameters[1]
true

However, this is not true for v1_b:

julia> v1_b = getfield.(v1, :b)
2-element Vector{Tuple{Real, Symbol}}:
 (1, :a)
 (1.1, :b)

julia> eltype(v1_b) == eltype(v1).body.parameters[2].ub
false

This means that as we construct more layers of composite-type instances as containers of parametric-type objects, the type inference capability of the compiler degrades. Consequently, I run into issues maintaining the type stability of methods that operate on multiple levels of highly hierarchical composite types with predefined type parameter bounds. Sometimes, I even encounter MethodError with the input argument as an array, even though the method should have worked if the compiler had not failed to propagate the type inference consistently.

Element type inference isn’t done by the compiler, there’s a separate mechanism for Array literals, and in your case it’s making a broader type than you’d like. You also don’t want the compiler doing element type inference the way it does variable type inference because it would infer the Union of the two types that lmiq mentioned, which you also don’t want. The discrepancy is because the compiler is incentivized to narrow things down as much as possible, while a mutable array is incentivized to permit more types by typejoining or promotion. Manual inference is the way to go here because no automatic inference can satisfy every use case; the type you did get or the narrowest Union could be satisfactory results in other contexts, and the algorithm cannot know which.

1 Like

Thanks for the clarification!

Personally, I would still hope for a more consistent type bound across different levels of a composite type. Regardless of how differently the compiler-based type inference and the specific mechanism for array-like objects work, I would think v1_b and v1_c have a consistent type parameter bound since they have a similar IR procedure:

julia> v1_b = getfield.(v1, :b)
2-element Vector{Tuple{Real, Symbol}}:
 (1, :a)
 (1.1, :b)

julia> v1_c = identity.(v1)
2-element Vector{myT{Float64}}:
 myT{Float64, Tuple{Int64, Symbol}}(1.0, (1, :a))
 myT{Float64, Tuple{Float64, Symbol}}(1.1, (1.1, :b))

julia> @code_lowered getfield.(v1, :b)
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.getfield, x1, x2)
│   %2 = Base.materialize(%1)
└──      return %2
)

julia> @code_lowered identity.(v1, :b)
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.identity, x1, x2)
│   %2 = Base.materialize(%1)
└──      return %2
)

The IR is showing nothing about the runtime element type inference in materialize. It’s not even consistent per input type, it heavily relies on the runtime values:

julia> secret::Int = 5; foo(v) = v >= secret ? v : missing
foo (generic function with 1 method)

julia> (x = rand(1:10, 5)) |> println
[5, 4, 2, 2, 6]

julia> foo.(x) |> println
Union{Missing, Int64}[5, missing, missing, missing, 6]

julia> secret = 0; foo.(x) |> println
[5, 4, 2, 2, 6]

julia> secret = 10; foo.(x) |> println
[missing, missing, missing, missing, missing]

The method foo, input x’s type, and even x’s values didn’t change, but the runtime value of a concretely typed global variable still made the output’s element type vary across Union{Missing, Int}, Int, or Missing. This usually trips up people starting to process missing data in tables.

It’s not impossible for hypothetical broadcasting to start a deterministic inference at the input arrays’ element types, but again, the standard and only reliable way to do that is manual inference. Automatic inference (and promotion, as you can see in the 1 → 1.0 change) can only guess at what other elements you might want to put in the array for your situation because there’s no one right answer.