Struct with abstract-type field

If I define a field inside a struct with an abstract type (for instance AbstractArray), am I making it unnecessarily hard for inference to guess the type?

Inference will “fail”. I think it doesn’t matter for inference whether you use AbstractArray or Any. So you have to use a parameter.

(Note though that if you don’t use that field inside a function, it has no negative impact on performance)

My guess is that in order to have performant code you’ll always have to do stuff with that field in a separate function, so the field will not be boxed.

julia> struct Test1
       x::AbstractArray
       end

julia> function notgood(a::Test1)
       r = zero(eltype(a.x))
       for el in a.x
       r+=el
       end
       return r
       end
notgood (generic function with 1 method)

julia> a = Test1(rand(100));

julia> @code_warntype notgood(a)
Variables:
  #self# <optimized out>
  a::Test1
  el::Any
  #temp#::Any
  r::Any

Body:
  begin
      r::Any = (Main.zero)((Base.eltype)((Base.typeof)((Core.getfield)(a::Test1, :x)::AbstractArray)::Type{#s45} where #s45<:AbstractArray{T,N} where N where T)::Any)::Any # line 3:
      SSAValue(0) = (Core.getfield)(a::Test1, :x)::AbstractArray
      #temp#::Any = (Base.start)(SSAValue(0))::Any
      5:
      unless !((Base.done)(SSAValue(0), #temp#::Any)::Any)::Any goto 14
      SSAValue(1) = (Base.next)(SSAValue(0), #temp#::Any)::Any
      el::Any = (Core.getfield)(SSAValue(1), 1)::Any
      #temp#::Any = (Core.getfield)(SSAValue(1), 2)::Any # line 4:
      r::Any = (r::Any + el::Any)::Any
      12:
      goto 5
      14:  # line 6:
      return r::Any
  end::Any

julia> function good(a::Test1)
       return _good(a.x)
       end
good (generic function with 1 method)

julia> function _good(x::AbstractArray)
       r = zero(eltype(x))
       for el in x
       r+=el
       end
       return r
       end
_good (generic function with 1 method)

julia> @code_warntype _good(a.x)
Variables:
  #self# <optimized out>
  x::Array{Float64,1}
  el::Float64
  #temp#::Int64
  r::Float64

Body:
  begin
      r::Float64 = (Base.sitofp)(Float64, 0)::Float64 # line 3:
      #temp#::Int64 = 1
      4:
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(x::Array{Float64,1})::Int64, 1)::Int64)::Bool)::Bool goto 14
      SSAValue(2) = (Base.arrayref)(x::Array{Float64,1}, #temp#::Int64)::Float64
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      el::Float64 = SSAValue(2)
      #temp#::Int64 = SSAValue(3) # line 4:
      r::Float64 = (Base.add_float)(r::Float64, el::Float64)::Float64
      12:
      goto 4
      14:  # line 6:
      return r::Float64
  end::Float64

julia> @code_warntype good(a)
Variables:
  #self# <optimized out>
  a::Test1

Body:
  begin
      return (Main._good)((Core.getfield)(a::Test1, :x)::AbstractArray)::Any
  end::Any

julia> using BenchmarkTools

julia> @benchmark notgood(a)
BenchmarkTools.Trial:
  memory estimate:  6.27 KiB
  allocs estimate:  301
  --------------
  minimum time:     12.749 ÎĽs (0.00% GC)
  median time:      13.057 ÎĽs (0.00% GC)
  mean time:        14.145 ÎĽs (4.18% GC)
  maximum time:     2.090 ms (96.06% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark good(a)
BenchmarkTools.Trial:
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     128.816 ns (0.00% GC)
  median time:      130.499 ns (0.00% GC)
  mean time:        142.417 ns (0.44% GC)
  maximum time:     2.576 ÎĽs (86.30% GC)
  --------------
  samples:          10000
  evals/sample:     888

When I don’t know the exact type of a field I usually make it a parameter of the struct, as::

struct Test2{T<:AbstractArray}
  x::T 
end
Test2(x) = Test2{typeof(x)}(x)
1 Like

Actually, I did specify the parameter.

mutable struct A
  xyz::AbstractArray{Float64,2}
end

Thanks, these are good points.

In this case this is what I would do:

mutable struct A{T<:AbstractArrary{Float64,2}}
  xyz::T 
end
3 Likes

I can see how this might be good for flexibility, but is it necessary for inferrability of the type?

It’s necessary for inferrability of the field. If you think about it, it has to be the necessary condition. If you can at any time change what kind of array that’s in there, i.e. it’s an array and you do A.xyz = @SMatrix [2,2;1,1], then how is the compiler ever supposed to know at compile time that you won’t change the value at runtime? If it’s not strictly typed, then the type is runtime information so of course it cannot be inferred.

Type parameters should be thought of as building a whole class or family of related types. Each type parameter makes a separate type. This specific type is like A, but it has the property that xyz has to be a specific type, which then allows inference.

1 Like

Yes, an AbstractArray{Float64, 2} can be anything. Here is one:

struct MyArray <: AbstractArray{Float64, 2}
    v::String
end

Clearly, you can’t infer anything just being given AbstractArray{Float64, 2}.

2 Likes

Really? Am I not saying that the abstract array stores Float64?

1 Like

As I showed, there is nothing in the language semantics that prevents you from defining such an Array. AbstractArray is just an abstract type, the compiler doesn’t know anything special about it.

1 Like

There’s something I’m seriously confused about: what is the type parameter then for?

Dispatch

Right. Makes sense.

1 Like

This is interesting:

julia> struct Atype
       x::AbstractVector{Float64}
       end

julia> struct Ctype
       x::Vector{Float64}
       end

julia> function test(a)
       r = zero(eltype(a.x))
       for el in a.x
       r+=el
       end
       return r
       end
test (generic function with 1 method)

julia> aa = Atype(rand(100));

julia> ca = Ctype(rand(100));

julia> @code_warntype test(aa)
Variables:
  #self# <optimized out>
  a::Atype
  el::Any
  #temp#::Any
  r::Any

Body:
  begin
      r::Any = (Base.sitofp)(Float64, 0)::Float64 # line 3:
      SSAValue(0) = (Core.getfield)(a::Atype, :x)::AbstractArray{Float64,1}
      #temp#::Any = (Base.start)(SSAValue(0))::Any
      5:
      unless !((Base.done)(SSAValue(0), #temp#::Any)::Any)::Any goto 14
      SSAValue(1) = (Base.next)(SSAValue(0), #temp#::Any)::Any
      el::Any = (Core.getfield)(SSAValue(1), 1)::Any
      #temp#::Any = (Core.getfield)(SSAValue(1), 2)::Any # line 4:
      r::Any = (r::Any + el::Any)::Any
      12:
      goto 5
      14:  # line 6:
      return r::Any
  end::Any

julia> @code_warntype test(ca)
Variables:
  #self# <optimized out>
  a::Ctype
  el::Float64
  #temp#::Int64
  r::Float64

Body:
  begin
      r::Float64 = (Base.sitofp)(Float64, 0)::Float64 # line 3:
      SSAValue(0) = (Core.getfield)(a::Ctype, :x)::Array{Float64,1}
      #temp#::Int64 = 1
      5:
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(SSAValue(0))::Int64, 1)::Int64)::Bool)::Bool goto 15
      SSAValue(2) = (Base.arrayref)(SSAValue(0), #temp#::Int64)::Float64
      SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
      el::Float64 = SSAValue(2)
      #temp#::Int64 = SSAValue(3) # line 4:
      r::Float64 = (Base.add_float)(r::Float64, el::Float64)::Float64
      13:
      goto 5
      15:  # line 6:
      return r::Float64
  end::Float64

I also though that using AbstractVector{Float64} would help a little with the inference of r (but still have a non-performant iteration) , since eltype(x::AbstractVector{T}) where {T} = T, but it didn’t help.
So I guess using AbstractVector{Float64} for the type will not help inference at all and will work just for dispatch, as Chris said.

1 Like

This is only a fallback. The actual array you have in Atype might have extended this method.

2 Likes

I can’t still understand: the element type is apparently Float64,

julia> struct MyArray <: AbstractArray{Float64, 2}
           v::String
           end
julia> a = MyArray("some string");
julia> eltype(a)
Float64

but clearly no floating-point value is stored. What is going on? Isn’t the type of the abstract array enforced in some way?

Now it occurred to me that I may know the answer to my question: the string v could store the matrix, and the abstract-array API could be realized by parsing the string. Am I close?

Abstract types, parametric or not, are just organizational umbrellas having many concrete types under them. This allows defining a function for all concrete types under a certain umbrella in the same way, leaving the specialization of the function for each concrete type to multiple dispatch. For example, a function which accepts 2 AbstractArrays and sums them would specialize by dispatching to a different + for (1:3)+(1:3) than for [1,2,3]+[1,2,3]. So nothing stops you from putting your own custom type under any umbrella you want. Of course if you are subtyping a parametric abstract type with type variables, the type variables of the abstract type must show up in your custom type, e.g. struct MyArray{T} <: AbstractArray{T, 2}. But nothing stops you from doing something like this:

struct MyArray{T} <: AbstractArray{T, 2}
     v::Char
end

which may or may not make sense in any practical context.

1 Like

eltype is just a function that happens to have a definition

eltype(::AbstractArray{T, N}) where {T,N} = T

in Base. I could define eltype(::MyArray) = 5 and that’s what you would get if you did eltype(A) where A::MyArray.

There seems to be some confusion here about the semantics of the language itself and an interface defined by humans. That eltype should return the element type you get by indexing is a convention imposed by humans not the compiler. Human write functions having this in mind and if you have an array that breaks this convention, it will likely not work well with these functions. However, the compiler doesn’t care. For the compiler, it is just types and functions and whatever method gets dispatched to, that is what is being executed.

There are actually not that many functions in Julia that are “special”. An example is === which can not be extended. But even something as basic as +(::Int, ::Int) can technically be redefined! Although if it doesn’t return the correct value, things break very fast.

5 Likes