If I define a field inside a struct with an abstract type (for instance AbstractArray), am I making it unnecessarily hard for inference to guess the type?
Inference will “fail”. I think it doesn’t matter for inference whether you use AbstractArray or Any. So you have to use a parameter.
(Note though that if you don’t use that field inside a function, it has no negative impact on performance)
My guess is that in order to have performant code you’ll always have to do stuff with that field in a separate function, so the field will not be boxed.
julia> struct Test1
x::AbstractArray
end
julia> function notgood(a::Test1)
r = zero(eltype(a.x))
for el in a.x
r+=el
end
return r
end
notgood (generic function with 1 method)
julia> a = Test1(rand(100));
julia> @code_warntype notgood(a)
Variables:
#self# <optimized out>
a::Test1
el::Any
#temp#::Any
r::Any
Body:
begin
r::Any = (Main.zero)((Base.eltype)((Base.typeof)((Core.getfield)(a::Test1, :x)::AbstractArray)::Type{#s45} where #s45<:AbstractArray{T,N} where N where T)::Any)::Any # line 3:
SSAValue(0) = (Core.getfield)(a::Test1, :x)::AbstractArray
#temp#::Any = (Base.start)(SSAValue(0))::Any
5:
unless !((Base.done)(SSAValue(0), #temp#::Any)::Any)::Any goto 14
SSAValue(1) = (Base.next)(SSAValue(0), #temp#::Any)::Any
el::Any = (Core.getfield)(SSAValue(1), 1)::Any
#temp#::Any = (Core.getfield)(SSAValue(1), 2)::Any # line 4:
r::Any = (r::Any + el::Any)::Any
12:
goto 5
14: # line 6:
return r::Any
end::Any
julia> function good(a::Test1)
return _good(a.x)
end
good (generic function with 1 method)
julia> function _good(x::AbstractArray)
r = zero(eltype(x))
for el in x
r+=el
end
return r
end
_good (generic function with 1 method)
julia> @code_warntype _good(a.x)
Variables:
#self# <optimized out>
x::Array{Float64,1}
el::Float64
#temp#::Int64
r::Float64
Body:
begin
r::Float64 = (Base.sitofp)(Float64, 0)::Float64 # line 3:
#temp#::Int64 = 1
4:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(x::Array{Float64,1})::Int64, 1)::Int64)::Bool)::Bool goto 14
SSAValue(2) = (Base.arrayref)(x::Array{Float64,1}, #temp#::Int64)::Float64
SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
el::Float64 = SSAValue(2)
#temp#::Int64 = SSAValue(3) # line 4:
r::Float64 = (Base.add_float)(r::Float64, el::Float64)::Float64
12:
goto 4
14: # line 6:
return r::Float64
end::Float64
julia> @code_warntype good(a)
Variables:
#self# <optimized out>
a::Test1
Body:
begin
return (Main._good)((Core.getfield)(a::Test1, :x)::AbstractArray)::Any
end::Any
julia> using BenchmarkTools
julia> @benchmark notgood(a)
BenchmarkTools.Trial:
memory estimate: 6.27 KiB
allocs estimate: 301
--------------
minimum time: 12.749 ÎĽs (0.00% GC)
median time: 13.057 ÎĽs (0.00% GC)
mean time: 14.145 ÎĽs (4.18% GC)
maximum time: 2.090 ms (96.06% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark good(a)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 128.816 ns (0.00% GC)
median time: 130.499 ns (0.00% GC)
mean time: 142.417 ns (0.44% GC)
maximum time: 2.576 ÎĽs (86.30% GC)
--------------
samples: 10000
evals/sample: 888
When I don’t know the exact type of a field I usually make it a parameter of the struct, as::
struct Test2{T<:AbstractArray}
x::T
end
Test2(x) = Test2{typeof(x)}(x)
Actually, I did specify the parameter.
mutable struct A
xyz::AbstractArray{Float64,2}
end
Thanks, these are good points.
In this case this is what I would do:
mutable struct A{T<:AbstractArrary{Float64,2}}
xyz::T
end
I can see how this might be good for flexibility, but is it necessary for inferrability of the type?
It’s necessary for inferrability of the field. If you think about it, it has to be the necessary condition. If you can at any time change what kind of array that’s in there, i.e. it’s an array and you do A.xyz = @SMatrix [2,2;1,1]
, then how is the compiler ever supposed to know at compile time that you won’t change the value at runtime? If it’s not strictly typed, then the type is runtime information so of course it cannot be inferred.
Type parameters should be thought of as building a whole class or family of related types. Each type parameter makes a separate type. This specific type is like A
, but it has the property that xyz
has to be a specific type, which then allows inference.
Yes, an AbstractArray{Float64, 2}
can be anything. Here is one:
struct MyArray <: AbstractArray{Float64, 2}
v::String
end
Clearly, you can’t infer anything just being given AbstractArray{Float64, 2}
.
Really? Am I not saying that the abstract array stores Float64
?
As I showed, there is nothing in the language semantics that prevents you from defining such an Array. AbstractArray
is just an abstract type, the compiler doesn’t know anything special about it.
There’s something I’m seriously confused about: what is the type parameter then for?
Dispatch
Right. Makes sense.
This is interesting:
julia> struct Atype
x::AbstractVector{Float64}
end
julia> struct Ctype
x::Vector{Float64}
end
julia> function test(a)
r = zero(eltype(a.x))
for el in a.x
r+=el
end
return r
end
test (generic function with 1 method)
julia> aa = Atype(rand(100));
julia> ca = Ctype(rand(100));
julia> @code_warntype test(aa)
Variables:
#self# <optimized out>
a::Atype
el::Any
#temp#::Any
r::Any
Body:
begin
r::Any = (Base.sitofp)(Float64, 0)::Float64 # line 3:
SSAValue(0) = (Core.getfield)(a::Atype, :x)::AbstractArray{Float64,1}
#temp#::Any = (Base.start)(SSAValue(0))::Any
5:
unless !((Base.done)(SSAValue(0), #temp#::Any)::Any)::Any goto 14
SSAValue(1) = (Base.next)(SSAValue(0), #temp#::Any)::Any
el::Any = (Core.getfield)(SSAValue(1), 1)::Any
#temp#::Any = (Core.getfield)(SSAValue(1), 2)::Any # line 4:
r::Any = (r::Any + el::Any)::Any
12:
goto 5
14: # line 6:
return r::Any
end::Any
julia> @code_warntype test(ca)
Variables:
#self# <optimized out>
a::Ctype
el::Float64
#temp#::Int64
r::Float64
Body:
begin
r::Float64 = (Base.sitofp)(Float64, 0)::Float64 # line 3:
SSAValue(0) = (Core.getfield)(a::Ctype, :x)::Array{Float64,1}
#temp#::Int64 = 1
5:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)((Base.arraylen)(SSAValue(0))::Int64, 1)::Int64)::Bool)::Bool goto 15
SSAValue(2) = (Base.arrayref)(SSAValue(0), #temp#::Int64)::Float64
SSAValue(3) = (Base.add_int)(#temp#::Int64, 1)::Int64
el::Float64 = SSAValue(2)
#temp#::Int64 = SSAValue(3) # line 4:
r::Float64 = (Base.add_float)(r::Float64, el::Float64)::Float64
13:
goto 5
15: # line 6:
return r::Float64
end::Float64
I also though that using AbstractVector{Float64}
would help a little with the inference of r
(but still have a non-performant iteration) , since eltype(x::AbstractVector{T}) where {T} = T
, but it didn’t help.
So I guess using AbstractVector{Float64}
for the type will not help inference at all and will work just for dispatch, as Chris said.
This is only a fallback. The actual array you have in Atype
might have extended this method.
I can’t still understand: the element type is apparently Float64
,
julia> struct MyArray <: AbstractArray{Float64, 2}
v::String
end
julia> a = MyArray("some string");
julia> eltype(a)
Float64
but clearly no floating-point value is stored. What is going on? Isn’t the type of the abstract array enforced in some way?
Now it occurred to me that I may know the answer to my question: the string v
could store the matrix, and the abstract-array API could be realized by parsing the string. Am I close?
Abstract types, parametric or not, are just organizational umbrellas having many concrete types under them. This allows defining a function for all concrete types under a certain umbrella in the same way, leaving the specialization of the function for each concrete type to multiple dispatch. For example, a function which accepts 2 AbstractArray
s and sums them would specialize by dispatching to a different +
for (1:3)+(1:3)
than for [1,2,3]+[1,2,3]
. So nothing stops you from putting your own custom type under any umbrella you want. Of course if you are subtyping a parametric abstract type with type variables, the type variables of the abstract type must show up in your custom type, e.g. struct MyArray{T} <: AbstractArray{T, 2}
. But nothing stops you from doing something like this:
struct MyArray{T} <: AbstractArray{T, 2}
v::Char
end
which may or may not make sense in any practical context.
eltype
is just a function that happens to have a definition
eltype(::AbstractArray{T, N}) where {T,N} = T
in Base. I could define eltype(::MyArray) = 5
and that’s what you would get if you did eltype(A)
where A::MyArray
.
There seems to be some confusion here about the semantics of the language itself and an interface defined by humans. That eltype
should return the element type you get by indexing is a convention imposed by humans not the compiler. Human write functions having this in mind and if you have an array that breaks this convention, it will likely not work well with these functions. However, the compiler doesn’t care. For the compiler, it is just types and functions and whatever method gets dispatched to, that is what is being executed.
There are actually not that many functions in Julia that are “special”. An example is ===
which can not be extended. But even something as basic as +(::Int, ::Int)
can technically be redefined! Although if it doesn’t return the correct value, things break very fast.