Syntax for putting constraints on type

sychen52 · April 17, 2022, 3:28am

I listed the syntax for putting constraints on types at different places, such as method, struct. Please let me know whether this is correct.

parametric type:

Vector{<:Real}
Vector{T} where T <: Real

method

function f(x::T) where {T<:Real}
function f(x::T where T<:Real)

struct

struct S{T<:Real}
where is not an option

I just feel it is a little bit inconsistent, so just listed out to check whether my understanding is correct.

gdalle · April 17, 2022, 6:29am

For methods, you are missing the simplest one: function f(x::Real)

FrancisKing · April 17, 2022, 7:12am

Many languages have these inconsistencies. It goes with the territory.

FrancisKing · April 17, 2022, 7:15am

Naively, I would expect x::Real to be slower than x::T where T<:Real. The first one requires work to determine what sort of value it is, the second one uses concrete type T.

Sukera · April 17, 2022, 7:27am

They’re exactly the same - both have to figure out which type the object is to specialize the function during compilation. The latter just allows you to use T (i.e. the actual type) in your function as well, if you require it.

gdalle · April 17, 2022, 1:29pm

Besides, in this case T doesn’t have to be a concrete type, it can be abstract as well
EDIT: this is wrong, see below

Sukera · April 17, 2022, 1:34pm

No. T will always be a concrete type, as that’s the only thing an object can have at runtime. abstract or Union types are only relevant for inference and dispatch, in a method it will be concrete.

The only time such a T can be non-concrete is when you explicitly pass that in via an already non-concrete type parameter, like here:

Old example with `Type`

julia> f(::Type{T}) where T = T
f (generic function with 1 method)

julia> f(Type{Integer})
Type{Integer}

julia> f(::Vector{T}) where T = T
f (generic function with 1 method)

julia> f(Integer[])
Integer

Henrique_Becker · April 17, 2022, 1:47pm

You could have used Vector for this example, Type is kinda special and may confuse readers.

gdalle · April 17, 2022, 2:10pm

My bad, today I learned!

sychen52 · April 17, 2022, 9:33pm

Another thing that feels tricky to me is that:
It is considered good/okay to use abstract type in function arguments. As @Sukera mentioned, a version of the method specialized to the type you actually used is created and compiled when you call the function.
However, it is considered bad when using abstract types in the fields of a struct. I just read it yesterday on the Performance tips page. It says if it is a mutable struct, the field with abstract type can change its type at runtime. However, I was a little bit surprised that for immutable structs, the compiler cannot do something like it does for method with abstract type.

StefanKarpinski · April 17, 2022, 10:08pm

It could, but allowing the two distinct behaviors allows you to ask for specialization or non-specialization, which is a useful distinction.

sychen52 · April 17, 2022, 10:33pm

On second thought, maybe I was wrong.

For the following example, in S{Int64}, and s.a could be Vector{Int64} or SparseVector{Int64}. As a result, S{Int64} is like an abstract type itself. Imagine if you put it into a Vector, Vector{S{Int64}} is like Vector{AbstractVector{Int64}}. The memory layout of each AbstractVector{Int64} can be very different (for example, some are all on the stack, some are all on the heap.). Therefore, it is hard to optimize. Is that right?

struct S{T}
    a::AbstractVector{T}
end

One more question, when you say specialization or non-specialization, is the example above non-specialization and the example below specialization:

struct S{T<:AbstractVector}
    a::T
end

Henrique_Becker · April 17, 2022, 11:25pm

Imprecise, depends on what you mean by “like”. S{Int64} is a concrete type. A concrete type that has a boxed field (or an abstract type field, if you prefer calling it that way), i.e., a field that may store different concrete types.

I believe that is not the case. Ever S{Int} will gave the same layout, a single boxed field (basically a void* pointer), and every S{Int} will store what is inside field a in the heap.

I may be wrong, but I think that what @StefanKarpinski referred to is a special behavior just for methods. See: Performance Tips · The Julia Language

sychen52 · April 18, 2022, 5:19am

I used to think StaticArray, which is a subtype of AbstractArray, is stack-allocated. However, After reading this, I realize Julia does not guarantee either stack or heap allocation.
But I feel it is also not guaranteed to be on the heap as you mentioned.

Henrique_Becker · April 18, 2022, 2:06pm

There is no way Julia can guarantee every instance of a Type is either stack allocated or heap allocated, this is just not how things work.

The exact rules are hard to lay out but, for example, a Vector{SVector{Int, 3}} will have each SVector{Int, 3} unboxed and contiguous in memory (basically, the layout will be the same as a Vector{Int} three times the size), but this will be heap memory; because unless the compiler is able to do some very aggressive optimization all memory of any Vector ends is from the heap, independently from its size or the type of the objects inside it. If it wasn’t in the heap, the Vector could not be dynamic in size, because the stack works exactly as it says, it is a stack, the moment you define other object that goes in the stack it starts immediately after the last object has ended, and resizing the Vector would overwrite variables defined in the same scope with garbage (you can actually do this in C, by defining a static-size array in the stack but writing after its end onto other variables of the scope). This is the reason the StaticArrays variants can end up in the stack, they do not have dynamic length, however, they will often end up in the stack only if they are not attributed to a region of memory that is not already guaranteed to be in the heap, like inside a Vector.

The explanation is messier than I do like, but I hope it helps. Any questions just ask.

sychen52 · April 18, 2022, 2:31pm

Thanks. That is very helpful. I think you are saying:

A user-defined subtype of AbstractArray may be stack-allocated, such as StaticArray.
A Vector{T} will be heap-allocated no matter what that T is, “unless the compiler is able to do some very aggressive optimization”
An example of this “very aggressive optimization” may happen if the compiler realizes the Vector is small and never change its size, but I guess this is just a necessary condition and not sufficient.

Henrique_Becker · April 18, 2022, 5:32pm

Yes, it is more or less what you pointed out. It is more complex in practice, and I will point out some of these subtleties but they are not necessary to understand in general:

Everything that is allocated in the heap in fact is often allocated both in the stack and the heap. For example, Vector{T} has its contents always in the heap, but the pointer to where in the heap it is and its size may be in the stack. This happens because the pointer itself and the Int storing the size are of fixed size, and if the Vector{T} does not escape the current scope (i.e., it is not passed to a non-inlined function nor is returned) then these info can be stored in the stack. If the Vector{T} is stored in the heap itself (not just its contents), then you have in the stack just the pointer to the pointer+length block which then points to the contents, i.e., a double indirection.
When Vector{SArray{Int, 3}} has its content (the SArrays) in the heap, they are inlined and contiguous, so even being in the heap is not that bad. The problem happens if it was a Vector{Vector{Int}} instead, for example, then what would be contiguous inside the outer vector is either the single pointers to the pointer+length blocks, or the pointer+length block themselves. This means already 3~4 layers of indirection, and this is one thing that starts slowing the code, because now accessing x[1][1] may mean to access the pointer x that leads to the pointer+length block of x and from it get the position of x[1] that contains just a pointer that you follow to the pointer+length block of x[1] and now you can finally follow that last pointer to know where is x[1][1] stored and fetch it. If the outer vector does not escape the scope and has SVector{Int} inside it, then the only indirection is the pointer+length block of x and the second index just offset the position in the same block of memory (all SVector contents are contiguously stored directly into the x block of memory, instead of pointers to the SVectors).

It is confusing, I know. Often pointers is where computer sci students start having problems.

sychen52 · April 18, 2022, 8:32pm

Thank you for the example. That is helpful!

Topic		Replies	Views
Need for explicitly defining subtypes in function arguments? New to Julia	30	1301	October 14, 2023
Struct with abstract-type field General Usage question	22	7632	June 24, 2020
Why [1, 2, 3] is not a Vector{Number}? New to Julia question , parametric-types	41	6952	December 9, 2022
Arrays of abstract types within a parametric type Internals & Design type , parametric-types	17	1384	April 21, 2022
Question about function argument New to Julia	7	1048	December 6, 2017

Syntax for putting constraints on type

parametric type:

method

struct

Related topics