Naively, I would expect x::Real to be slower than x::T where T<:Real. The first one requires work to determine what sort of value it is, the second one uses concrete type T.
They’re exactly the same - both have to figure out which type the object is to specialize the function during compilation. The latter just allows you to use T (i.e. the actual type) in your function as well, if you require it.
No. T will always be a concrete type, as that’s the only thing an object can have at runtime. abstract or Union types are only relevant for inference and dispatch, in a method it will be concrete.
The only time such a T can be non-concrete is when you explicitly pass that in via an already non-concrete type parameter, like here:
Old example with `Type`
julia> f(::Type{T}) where T = T
f (generic function with 1 method)
julia> f(Type{Integer})
Type{Integer}
julia> f(::Vector{T}) where T = T
f (generic function with 1 method)
julia> f(Integer[])
Integer
Another thing that feels tricky to me is that:
It is considered good/okay to use abstract type in function arguments. As @Sukera mentioned, a version of the method specialized to the type you actually used is created and compiled when you call the function.
However, it is considered bad when using abstract types in the fields of a struct. I just read it yesterday on the Performance tips page. It says if it is a mutable struct, the field with abstract type can change its type at runtime. However, I was a little bit surprised that for immutable structs, the compiler cannot do something like it does for method with abstract type.
For the following example, in S{Int64}, and s.a could be Vector{Int64} or SparseVector{Int64}. As a result, S{Int64} is like an abstract type itself. Imagine if you put it into a Vector, Vector{S{Int64}} is like Vector{AbstractVector{Int64}}. The memory layout of each AbstractVector{Int64} can be very different (for example, some are all on the stack, some are all on the heap.). Therefore, it is hard to optimize. Is that right?
struct S{T}
a::AbstractVector{T}
end
One more question, when you say specialization or non-specialization, is the example above non-specialization and the example below specialization:
Imprecise, depends on what you mean by “like”. S{Int64} is a concrete type. A concrete type that has a boxed field (or an abstract type field, if you prefer calling it that way), i.e., a field that may store different concrete types.
I believe that is not the case. Ever S{Int} will gave the same layout, a single boxed field (basically a void* pointer), and every S{Int} will store what is inside field a in the heap.
I used to think StaticArray, which is a subtype of AbstractArray, is stack-allocated. However, After reading this, I realize Julia does not guarantee either stack or heap allocation.
But I feel it is also not guaranteed to be on the heap as you mentioned.
There is no way Julia can guarantee every instance of a Type is either stack allocated or heap allocated, this is just not how things work.
The exact rules are hard to lay out but, for example, a Vector{SVector{Int, 3}} will have each SVector{Int, 3} unboxed and contiguous in memory (basically, the layout will be the same as a Vector{Int} three times the size), but this will be heap memory; because unless the compiler is able to do some very aggressive optimization all memory of any Vector ends is from the heap, independently from its size or the type of the objects inside it. If it wasn’t in the heap, the Vector could not be dynamic in size, because the stack works exactly as it says, it is a stack, the moment you define other object that goes in the stack it starts immediately after the last object has ended, and resizing the Vector would overwrite variables defined in the same scope with garbage (you can actually do this in C, by defining a static-size array in the stack but writing after its end onto other variables of the scope). This is the reason the StaticArrays variants can end up in the stack, they do not have dynamic length, however, they will often end up in the stack only if they are not attributed to a region of memory that is not already guaranteed to be in the heap, like inside a Vector.
The explanation is messier than I do like, but I hope it helps. Any questions just ask.
Thanks. That is very helpful. I think you are saying:
A user-defined subtype of AbstractArray may be stack-allocated, such as StaticArray.
A Vector{T} will be heap-allocated no matter what that T is, “unless the compiler is able to do some very aggressive optimization”
An example of this “very aggressive optimization” may happen if the compiler realizes the Vector is small and never change its size, but I guess this is just a necessary condition and not sufficient.
Yes, it is more or less what you pointed out. It is more complex in practice, and I will point out some of these subtleties but they are not necessary to understand in general:
Everything that is allocated in the heap in fact is often allocated both in the stack and the heap. For example, Vector{T} has its contents always in the heap, but the pointer to where in the heap it is and its size may be in the stack. This happens because the pointer itself and the Int storing the size are of fixed size, and if the Vector{T} does not escape the current scope (i.e., it is not passed to a non-inlined function nor is returned) then these info can be stored in the stack. If the Vector{T} is stored in the heap itself (not just its contents), then you have in the stack just the pointer to the pointer+length block which then points to the contents, i.e., a double indirection.
When Vector{SArray{Int, 3}} has its content (the SArrays) in the heap, they are inlined and contiguous, so even being in the heap is not that bad. The problem happens if it was a Vector{Vector{Int}} instead, for example, then what would be contiguous inside the outer vector is either the single pointers to the pointer+length blocks, or the pointer+length block themselves. This means already 3~4 layers of indirection, and this is one thing that starts slowing the code, because now accessing x[1][1] may mean to access the pointer x that leads to the pointer+length block of x and from it get the position of x[1] that contains just a pointer that you follow to the pointer+length block of x[1] and now you can finally follow that last pointer to know where is x[1][1] stored and fetch it. If the outer vector does not escape the scope and has SVector{Int} inside it, then the only indirection is the pointer+length block of x and the second index just offset the position in the same block of memory (all SVector contents are contiguously stored directly into the x block of memory, instead of pointers to the SVectors).
It is confusing, I know. Often pointers is where computer sci students start having problems.