Arrays of abstract types within a parametric type

Hi all, I’m wondering if this behavior is intentional. Let’s look at this example

# This works
struct A
  n::Real
  v::Vector{Real}
end

# This works
struct B{T}
  n::T
  v::Real
end

# This doesn't work
struct C{T}
  n::T
  v::Vector{Real}
end
julia> A(1.0, [2.0, 3.0])
A(1.0, Real[2.0, 3.0])

julia> B(1.0, 2.0)
B{Float64}(1.0, 2.0)

julia> C(1.0, [2.0, 3.0])
ERROR: MethodError: no method matching C(::Float64, ::Vector{Float64})
Closest candidates are:
  C(::T, ::Vector{Real}) where T at REPL[3]:3

I know I could define a second type parameter for C to make it work and that would be fine, although perhaps a bit more messy in some cases. The error is just a bit unexpected because the first two cases work… It’s a bit hard to anticipate what’s acceptable and what’s not.

-Tuukka

EDIT: I had Array instead of Vector by mistake, now corrected.

julia> methods(A)
# 2 methods for type constructor:
[1] A(n::Real, v::Array{Real}) in Main at REPL[1]:2
[2] A(n, v) in Main at REPL[1]:2

julia> methods(C)
# 1 method for type constructor:
[1] C(n::T, v::Array{Real}) where T in Main at REPL[4]:

Since A is concretely typed, it can convert arguments using the A(n, v) method, whereas C evidently can’t. Maybe someone can comment on why the constructor for C can’t convert the array.

Although, I wonder if you’re looking specifically for an array of Real, or are willing to accept subtypes at the expense of the struct not being concretely typed?

Eg.

julia> struct D{T}
         n::T
         v::Array{<:Real}
       end

julia> methods(D)
# 1 method for type constructor:
[1] D(n::T, v::Array{<:Real}) where T in Main at REPL[8]:2

julia> D(1.0, [2.0, 3.0])
D{Float64}(1.0, [2.0, 3.0])

Hey thanks for the tip, that would work for me! I wasn’t aware you can insert the inequality there directly (I was thinking of something like struct D{T,U} where {U<:Real}).

Still, my point remains that in cases like this it’s difficult to anticipate what will throw an error and what will pass, unless you have a deep understanding of the type system (i.e. in what cases the conversion is possible). Then you end up spending quite some time experimenting… I’m wondering whether that could be fixed by trying to make the behavior more consistent.

Be aware that Array{Real} is not a concrete type:

julia> Array{Real} |> isconcretetype
false

julia> Array{Real} |> typeof
UnionAll

Are you perhaps looking for Vector{<:Real} (which is the same as Array{<:Real, 1})?

julia> Vector{<:Real}
Vector{<:Real} (alias for Array{<:Real, 1})

IMO the behavior is perfectly consistent - subtyping in julia just isn’t covariant but invariant. Thus, even though we have Float64 <: Real, we DO NOT have Vector{Float64} <: Vector{Real} (but we DO have Vector{Float64} <: Vector{<:Real}, because the second vector is not a concrete or abstract type but a UnionAll). Widening the element type of a concretely typed vector would require an allocation of a new array, to box each element. That this “conversion” isn’t done automatically imo shouldn’t be suprising, as it wouldn’t preserve object identity on field assignment.

3 Likes

Indeed, I meant Vector, Array was a “typo”… I fixed it in the excerpt, it doesn’t really change the output. Btw, I’m using SVectors, but used Vector in this example for simplicity.

So it kinda makes sense in light of the documentation that C doesn’t work, but then why does A work? Clearly it has something to do with A being concrete and C not, but it still seems unexpected that one works and the other doesn’t.

While this works, it is not necessarily a good solution. For example:

julia> d1 = D(2.1, [3, 4])
D{Float64}(2.1, [3, 4])

julia> d2 = D(2.1, [0x7 0x5])
D{Float64}(2.1, UInt8[0x07 0x05])

julia> typeof(d1) == typeof(d2)
true

The compiler does not know the difference between the types of those two, though one contains a Vector{Int} and the other a Matrix{UInt8}.

It is definitely recommended to also parameterize the array.

There might be times when one wants this behavior to avoid excessive compilation time. IIRC DataFrames doesn’t parameterize their structs for this reason.

julia> d = DataFrame([:a => 1:2, :b => 0])
2×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> d2 = DataFrame([:a => Float64.(1:2), :b => 0])
2×2 DataFrame
 Row │ a        b     
     │ Float64  Int64 
─────┼────────────────
   1 │     1.0      0
   2 │     2.0      0

julia> typeof(d) == typeof(d2)
true

Still, I think it’s important to point out, since the Vector{<:Real} type specification is perfectly fine in function signatures, but can have very different consequences in struct definitions. And also because deliberately avoiding specialization is probably for advanced and demanding use cases. DataFrames isn’t exactly a run-of-the-mill small beginner project.

3 Likes

It has to do with which default constructors are available:

julia> struct C{T}
         n::T
         v::Vector{Real}
       end

julia> C{Float64}(1.0,[1.0,2.0])
C{Float64}(1.0, Real[1.0, 2.0])

julia> C(1.0,[1.0,2.0])
ERROR: MethodError: no method matching C(::Float64, ::Vector{Float64})
Closest candidates are:
  C(::T, ::Vector{Real}) where T at REPL[1]:2
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

And I think this is, effectively, something missing. C(1.0, [1.0,2.0]) should work, as A does, and be equivalent to C{Float64}(1.0, [1.0,2.0]) given that the type parameter is not ambiguous. You can define it yourself, of course:

julia> C(a::T,b) where T = C{T}(a,b)
C

julia> C(1.0, [1.0,2.0])
C{Float64}(1.0, Real[1.0, 2.0])

Maybe there is some intricate reason for this to be hard to implement in general, but I have the impression of having seen some discussion and a related bug report mentioning this kind of thing already.

6 Likes

Note also that if you use Vector{Real}, that can be be bad for performance, because that is an array that can contain mixed types of numbers:

julia> x = Real[1, 1.0, 1.f0]
3-element Vector{Real}:
 1
 1.0
 1.0f0

What you probably want is something like this:

julia> struct D{T1<:Real,T2<:Real}
         n::T1
         v::Vector{T2}
       end

julia> D(1.0,[1,2])
D{Float64, Int64}(1.0, [1, 2])

where all types are concrete and the array is an homogeneous container.

1 Like

I had a feeling this is some sort of a bug/oversight, good to hear someone agrees :slight_smile: . For consistency, the default constructor should be available for C as well, and good to hear something like that might be in the works…

Yes no doubt, for rigorous code for libraries etc. it’s probably better to just define all the type parameters. Personally, however, I like to avoid too much syntactic complexity where it does not bring a concrete benefit (small codes/scripts), and too many capital letters after the struct definition feels like an eyesore… Of course I could just leave everything untyped like in Python, but I sometimes like to specify some types for clarity, also to practice using the type system :slight_smile: . In my code I actually just wanted to specify that the vector is a 3-element SVector v::SVector{3,Real} and the element type was not that important.

1 Like

Perfectly fine, although using static arrays with boxed fields seems like a contradiction :slight_smile:

My gut tells me that this constructor doesn’t exist because it’s only straightforward in this case, where T is only used once. Consider:

struct E{T <: Real}
   a::T
   b::Vector{T}
end

Now think about calling E(1, [1.0]) - which argument (if any) should be converted? Both E(1.0, [1.0]) and E(1, [1]) are valid and you also immediately run into problems when you have e.g. E(Int8(-1), [0xff]) because both conversions and a type promotion throw:

julia> convert(Int8, 0xff)
ERROR: InexactError: check_top_bit(Int8, 255)

julia> convert(UInt8, Int8(-1))
ERROR: InexactError: check_top_bit(UInt8, -1)

julia> promote(Int8(-1), 0xff)
ERROR: InexactError: check_top_bit(UInt8, -1)

The non-parametric version with explicit Int and Vector{Int} fields is perfectly fine with any conversion, because such an ambiguity can never occur! All fields can be promoted/converted in isolation. So IMO it’s perfectly ok to say “no, be specific with what you want to enforce” when talking about parametric types.

There is a default constructor - that is exactly what C (no parameters) is. That is just a distinct method & method table from C{T}:

What’s really going on here is that Point , Point{Float64} and Point{Int64} are all different constructor functions. In fact, Point{T} is a distinct constructor function for each type T .
[…]
When the type is implied by the arguments to the constructor call, as in Point(1,2) , then the types of the arguments must agree – otherwise the T cannot be determined – but any pair of real arguments with matching type may be given to the generic Point constructor.

and to prove that:

julia> struct C{T}
           n::T
           v::Vector{Real}
       end

julia> methods(C)
# 1 method for type constructor:
[1] C(n::T, v::Vector{Real}) where T in Main at REPL[1]:2

julia> methods(C{Int})
# 1 method for type constructor:
[1] C{T}(n, v) where T in Main at REPL[1]:2

julia> methods(C)[1] === methods(C{Int})[1]
false

### with differently typed fields:
julia> struct C{T <: Real}
           n::T
           v::Vector{T}
       end

julia> methods(C)
# 1 method for type constructor:
[1] C(n::T, v::Vector{T}) where T<:Real in Main at REPL[1]:2

julia> methods(C{Int})
# 1 method for type constructor:
[1] C{T}(n, v) where T<:Real in Main at REPL[1]:2

julia> methods(C)[1] === methods(C{Int})[1]
false

So no, the behavior is not a bug or broken, it is an expected consequence of an ambiguity that comes up when this (inevitably) has to be generalized to more than one argument constraining the T.

3 Likes

Interesting, so what OP has defined actually does work as expected, except we need to call the typed constructor:

julia> C{Float64}(1.0, [2.0, 3.0])
C{Float64}(1.0, Real[2.0, 3.0])

This is all reasonable, but similar choices were made already in situations that facilitate our lives:

julia> [ 1, 1.0 ]
2-element Vector{Float64}:
 1.0
 1.0

One could argue that this should either error or produce Real[ 1, 1.0 ], or something else. It is arbitrary, but convenient. We could well have, similarly,

julia> struct A{T}
           x::T
           y::T
       end

julia> function A(x::T1,y::T2) where {T1, T2}
           T = promote_type(T1,T2)
           A{T}(x,y)
       end
A

julia> A(1, 1.0)
A{Float64}(1.0, 1.0)

1 Like

Here is the issue (which links some others) discussing this behavior: https://github.com/JuliaLang/julia/issues/17186

Thanks, so my case B was discussed there and a constructor has been since added. Hopefully someone will also suggest an automatic constructors for cases like C.

By the way, thanks to everyone for the input and discussion! Seems like a lively community :slight_smile: .

To reflect a bit more on my use of typing, I do use careful typing in data structures for performance critical parts. I use SVectors for geometric points/vectors. In the rest of the data containers, the typing is more for consistency/aesthetics (while I try to keep it out of the way so that I don’t need to add too many explicit conversions etc). That’s why I don’t want to go overboard with the type parameters and such.

1 Like

My rule of thumb is to be as specific as possible in structs, and as lenient as possible in function signatures. The reasoning is that struct layout & concreteness of fields impacts performance the most, which you can steer extremely precisely with concretely typed fields. If you have abstractly typed fields, the field will be boxed, access has to chase that pointer/type check and things slow down just from cache misses alone.

Typing arguments in function signatures only influences dispatch & restricts what the function can actually receive, it has no impact on whether a function is compiled “generically” or not (though there are some exceptions where specialization is avoided, but those are edge cases and not the general norm). The optimizations for f(x) and f(x::Int) called with y = 5 are exactly the same, except the latter can’t take anything other than Int.

The fact that this has been open for almost 6 years indicates to me that it’s not a pressing issue and unlikely to be decided soon. As you say, any choice would be an arbitrary one - I for one would be in favor of a better error message for this, explaining why it doesn’t exist right now.

6 Likes