Annotating Types: Best practice for beginners

As a beginner I’m slightly unsure of when I should be annotating my code with types. I’ve read the section on types in the Julia manual and I’m happy with the basics (type hierarchy, parametric types, avoiding abstract types, and the importance of type stability) but I’m still slightly unsure how I should using type annotations in practice.

For instance, from the QuantEcon website:

You will notice that in the lecture notes we have never directly declared any types. This is intentional both for exposition and as a best practice for using packages (as opposed to writing new packages, where declaring these types is very important). It is also in contrast to some of the sample code you will see in other Julia sources, which you will need to be able to read. To give an example of the declaration of types, the following are equivalent. While declaring the types may be verbose, would it ever generate faster code? The answer is almost never.

function f(x, A)
    b = [5.0, 6.0]
    return A * x .+ b
end

val = f([0.1, 2.0], [1.0 2.0; 3.0 4.0])
function f2(x::Vector{Float64}, A::Matrix{Float64})::Vector{Float64}
    # argument and return types
    b::Vector{Float64} = [5.0, 6.0]
    return A * x .+ b
end

val = f2([0.1; 2.0], [1.0 2.0; 3.0 4.0])

However, I have seen other sources that suggest that annotating code with type information is the key to performance in Julia. For instance, I’m told that if I preallocate a vector I should always include type information.

I’m sure there’s a lot I’m not fully understanding but If someone could provide some tips or general guidance on these issues, it would be very helpful for beginners like me.

2 Likes
  • Annotate types inside of type definitions for performance.
  • Annotate types inside of function definitions to exploit multiple dispatch or to prevent duck typing.

Basically, this is good for performance:

struct Foo
    x::Int64
    y::Float64
end

And this is bad:

struct Foo
    x
    y
end

In contrast, performance is the same for either of these when called on z of type Foo:

function bar(z)
    z.x + z.y
end
function bar(z::Foo)
    z.x + z.y
end
23 Likes

Adding to this, it’s also equally fast using parametric types like this, instead of explicit types :

struct Foo{X,Y}
    x::X
    y::Y
end
13 Likes

Well, yes and no.
Yes, it is important to preallocate a vector with a concrete element type for performance.
However, you can do it without specifying the argument types in many cases, as there are helper functions:
similar(x) - creates an uninitialized array with the same type and the same dimensions as x
typeof(x) - returns the type of x, can be used as a constructor
eltype(x) - returns the type of elements in x.

For example, those expressions are equivalent for an x::Vector{Float64}:

#1
y = Vector{Float64}(undef, length(x))

#2
y = typeof(x)(undef, length(x))

#3
y = similar(x)

#4
y = resize!(eltype(x)[], length(x))

In my personal opinion, it is good to use type annotations, but usually you shouldn’t specify concrete types. E.g., foo(x::Vector{Float64}) is typically not what you want, foo(x::AbstractVector{<:Real}) is more like it. And when the number of arguments is greater than 2, you can easily see why one is encouraged to not use type annotations at all :slight_smile:

It is also worth noting that typing in Julia is not the same as typing in C++. In C++, foo(int n) happily accepts all types for n which it can convert into int (i.e., long, char, size_t, even float are all fine, although you may opt into having a warning for the last case). In Julia, foo(n::Int64) means only Int64s are allowed, no Int32s, BigInts, UInt8s etc. As such, annotating function arguments with concrete types might not even reflect the programmer’s intent properly.

9 Likes

Note that both may be too restrictive if the code is intended to work with eg

Union{Missing,Int}[1,2,3]

or similar, which can easily happen in practice.

It is better not to rely on container element types for dispatch, except possibly for optimized versions which do the same thing.

13 Likes

Side-note: as a Julia beginner I find the answers and examples given here extremely useful! I would love to see more of these kinds of questions and responses in the manual or faq.

8 Likes

A great first step would be for some intrepid member of the community to categorize all the questions on the forum each month, track the most frequently asked and make sure the best answers to those end up in a special doc. :slight_smile:

7 Likes

To add on, if you have Vector as a type parameter in a function, you can’t use skipmissing either, which can be frustrating.

Unless you are using things which you need a vector for, you should avoid specifying the container type.

2 Likes

I completely agree!

1 Like

This is one of the rare situations where I think my view is different to most people here, so follow my advice at your own risk :slight_smile:

I personally like to include type annotations on both the inputs and outputs to most of the code I write. This is for the simple reason that I find that when I come back and look at a piece of code three months later, the type annotations are very helpful in reminding my brain how a particular bit of code is structured.

It is worth emphasizing that a lot of the code I write is just for me, so keeping things as general as possible for the sake of others who might use my code is less of a priority. Having said that, I do have a few registered statistical packages, and even in those packages I always include annotations, albeit I’ve gone to some effort to make sure the annotations are as general as possible, and (for example) allow for things like Missing when working with number types.

6 Likes

One aspect I find relevant is to think about the way code will error given different inputs. Sometimes the author assumes inputs will be of a certain type, then I think it should be type annotated. Any other type working with that function would be accidental, and errors with other types could possibly happen way later in the stack. This can lead to frustrating bugs.

When someone writes a generic function, I then think it’s again important to check the generic assumptions with meaningful error messages if they aren’t met. For example checking if some type can be iterated, or its length is known or it’s a bits type. Those can be traits sometimes.

Anyway, my point is, the more generic a pipeline, and the less assumptions are checked in the code with meaningful explanatory error messages, the more likely you are to hit bugs deep down the stack without knowing that you violated an implicit assumption further up.

7 Likes