T
is not the same for the vectors and matrices, I thought you wanted distributions in the matrices?
I do. I’m just trying to understand why some fields are parameterized and others are not. I understand that the big reason for parameterizing is to create a family of types, but parameterizing is also done for performance.
My point is that
struct Something{T}
x::T
y::T
end
means that x and y have to be the same type, so you’d need two type parameters here if you wanted say x to be a float and y to be a distribution
Sure. But one point that is made about parameterization is that it prevents blocking and thus reduces allocations and speeds things up. So is it best to parameterize every field in a type or just some of them.
struct myStruct{T}
a :: Float64
b :: Vector{T}
vs.
struct myStruct{F,T}
a::F
b:: Vector{T}
It’s not parametrizing though that allows the compiler to optimise, but rather the specification of a concrete type itself.
In your example, both structs will have the same performance in general, it’s just that the second is more flexible.
thinking…
so pt
below will be concrete even though it’s field b
is not concrete?
pt = myStruct(5.5,fill(Gamma(10,0.1),5))
As a simple example:
struct myType
x::Float64
end
pt = MyType(3.2)
typeof(pt) # output: MyType
struct myType{T}
x::T
end
pt = MyType(3.2)
typeof(pt) # output: MyType{Float64}
Isn’t the second type more efficient " because the second version specifies the type of x
from the type of the wrapper object." (from the manual)
I guess my question boils down to: How do I know which fields in a composite type I should parameterize, strictly from a performance point of view?
Basically, there are concrete types, i.e., there exist actual values of that type, and abstract types of which no values exist. As an example, consider numbers, e.g., Int64 and Float64 are concrete types as can be witnessed by the values 1
and 1.0
with typeof(1) == Int64
and typeof(1.0) == Float64
. There is no value of type Real
though, i.e., there is no x such that typeof(x) == Real
.
Containers, such as Vector
s or structs are a bit more complicated, as they can hold concrete as well as abstract types as elements. First, some examples with concrete types:
julia> using Distributions
julia> v = [Gamma(1.0, 1.1), Gamma(1.2, 1.3)];
julia> v |> typeof
Vector{Gamma{Float64}} (alias for Array{Gamma{Float64}, 1})
julia> v |> typeof |> isconcretetype
true # as witnessed by the value v
julia> v |> eltype
Gamma{Float64}
julia> v |> eltype |> isconcretetype
true
# Gamma is parameterized struct, i.e., has an eltype as well
julia> v |> eltype |> eltype
Float64
julia> v |> eltype |> eltype |> isconcretetype
true
Thus, the types of all containers – the vector and the Gamma struct – as well as of their elements are concrete. In this case, the compiler can optimize as the type of v[i].α
can be inferred independently of the index i
.
Now, consider a vector holding several different distributions:
julia> w = [Gamma(1.0, 2.0), Normal(1.0, 2.0), TDist(3.2)];
julia> w |> typeof
Vector{Distribution{Univariate, Continuous}}
julia> w |> typeof |> isconcretetype
true # Sure the value w is our witness
julia> w |> eltype
Distribution{Univariate, Continuous}
julia> w |> eltype |> isconcretetype
false
# Yet, for any specific element
julia> w[2] |> typeof
Normal{Float64}
julia> w[2] |> typeof |> isconcretetype
true # The element value does indeed exist
How can this be? Each element of the vector is a value with a concrete type, yet the element type of the vector is abstract:
- It implies that the compiler has less information and in particular, the type of
w[2]
cannot be inferred at compile time but requires a runtime dispatch. - The vector cannot store its elements inline as they might have different sizes and memory layouts. Instead, it holds pointers to some values whose types are compatible with the eltype, i.e., the actual type of values is not known until runtime when the value is retrieved.
In Rust, v
would be a Vec<Float64>
whereas w
would be Vec<Box<dyn Distribution>>
explicitly stating the different memory layout as well as the required runtime dispatch.
For this example there is no difference as Float64
is a concrete type and accordingly myType1
and myType2{Float64}
are effectively the same.
The type of a field should be parameterized whenever it would be abstract, e.g.,
struct myTypeA{T<:Real}
x::T
end
instead of
struct myTypeB
x::Real
end
Both, allow to create a struct holding any real, but the first allows the compiler to distinguish the (concrete) types myStructA{Int64}, myStruct{Float16}
, …
Great explanation. Thank you. So using a type with a field holding your second example would be less efficient than the case where all of the distributions are of the same type. And there’s no way around this inefficiency unless you just redesign the whole code structure?
The manual gives the distinct impression that if typeof()
produces just the type’s name, with no parameter information,(myType
vs myType{Float64}
it is a less efficient data structure because it can’t infer the types of it’s fields.
No, the compiler can indeed know the type of the field, because that type is right there in the definition, which the compiler can see (the compiler reads your code, you know )
The parameter in myType{Float64}
is necessary to tell what T
actually is, but that isn’t needed in the other type definition, since it’s hard-coded.