Why specify argument types and return types

New Julia user here. Trying to understand the benefit of this:

function test(p::Float64,x::Int) :: Float64

<do something>

end

over the more pythonic:

function test(p,x)

<do something>

end

The same question could be asked of composite types. Is this:

struct myStruct{T<:Real}
x::T
y::T

better than this:

struct myStruct
x::Float64
y::Float64

or this:

struct myStruct
x
y

My stab at an answer:

1- Declaring the return type for a function should improve performance because the compiler will know the type of the return value.
2- Declaring the types of the inputs shouldn’t improve performance necessarily. Each time this function is called with new (different) parameter types, it will compile a version of the function for that particular set of inputs. Leaving the types off of the input arguments makes the function more general and Julia will optimize on the fly.

the biggest benefit is you can use multiple-dispatch. to do different things. If you’re an end-user and just writing scripts, often you don’t need to do any type-annotation at all.

But won’t Julia use multiple dispatch even on a function like this:

function test(x,y)
    x
    y
end

because Julia will compile a new version of this function every time the function gets called with new parameter types??

yes but having only one method, you can’t do different things for different types.

I can see that: wanting a function to behave differently for different types of inputs. But is there a performance boost that comes with all these declarations?

That’s not multiple dispatch, that’s compile-time specialization.

Multiple dispatch means having different versions of the functions (different methods) for different argument types.

Three reasons to declare argument types:

  • Clarity: if you want to give more information to the user about what type your function is expecting.
  • Correctness: if your function may give wrong results or unexpected errors unless the arguments have certain types.
  • Dispatch: If you want to implement different versions (methods) of your function for different types.

Not on this list is performance: as you say, the Julia compiler always specializes a function at compile-time for the concrete argument types, so there is no performance benefit to an argument-type declaration.

  • Caveat: there are exceptions where Julia’s compiler will not type-specialize unless you explicitly add a declaration, e.g. for higher-order functions, but this is a relatively rare concern.

On the other hand, if you over-specify the argument types (e.g. using Float64 instead of just Real or Vector{Float64} instead of AbstractVector{<:Real}), then you may be unnecessarily restricting the generality and composability of your function (with no performance benefit).

Declaring return types is generally unnecessary also, but occasionally may be a convenience if you have several points in the function where you may return a value and you don’t want to individually make sure they each return the same value. i.e. if you want to enforce type-stability in a bit of a lazy way, or in rare cases if you want to correct for some failure of type inference. Again you have to be careful that in doing so you don’t overly restrict your function’s generality.

25 Likes

People have been mostly answering the first part of your question, about type annotations in function inputs and outputs. Those annotations do not affect performance (with a few exceptions). These annotations are a way to make use of dispatch, and to enforce argument types.

The second part of your question concerns types you define. There, annotations are very important for performance. To help the compiler generate the fastest code possible, type should be defined to be “concrete”.

The first two examples you gave are concrete, given that the annotated types are themselves concrete (here, T or Float64):

struct myStruct{T<:Real}
x::T
y::T
end

and

struct myStruct
x::Float64
y::Float64
end

The only difference between the two is that the first one is parameterized: you are defining a famility of types. Note that if you annotate with an abstract type, e.g. replacing Float64 by Real, performance will suffer.

The last case, where you are not even specifying a type at all, is the worst for performance:

struct myStruct
x
y
end

Those kinds of definitions are to be avoided.

6 Likes

Makes sense:if I specify Float64, then the function won’t work for integer inputs, but if I specify Real then it will work for both integers and floats.

Thanks for the reply. Great instruction.

See the manual for more details on type definitions and performance:

https://docs.julialang.org/en/v1/manual/performance-tips/#Type-declarations

So there is no performance difference between these two?

In the first case, I can declare the type like this:

myStruct{Float64}(5,6)

but I can’t use that syntax on the second definition.

None — myStruct{Float64} is exactly equivalent in performance and memory layout to the second type.

(Think of myStruct{T} not as a single type but as a set of types, which allows you to write type-generic code but still get the performance benefits of concrete struct layouts for particular instances.)

4 Likes

Correct.

There is no need for that syntax since the types of the fields must be Float64. You can write

myStruct(5,6)

and just like in the other case the default constructor will simply convert the numbers:

julia> myStruct(5,6)
myStruct(5.0, 6.0)

No, unfortunately no, not in the general sense. You can always declare a new method for that function that will return a different type, so the compiler cannot make this leap for you. If it knows the input types, then it will know the exact method, and probably will be able to infer the return type too (in some cases, that I think they are rare, it may help when the inference fails at this point), and generally nothing is gained; on the other hand, if it does not know the input types (i.e., inference has failed to prove they can only have type X) then the code does not know what method will be called and it is back to square one, annotating the return type on the method signature will give no help because the code does not even know that is that method/body of the function that will be called. However, annotating the return type at the point where the function is called (not defined) will solve some inference problems for you.

2 Likes

One additional point that no one has mentioned yet: I often annotate both my input and output types on functions even though it makes no difference to performance. I do this because I find that when I look at the code in three months time I’m able to pick up on what the code does much, much faster.

3 Likes

Continuing that thought, wouldn’t this also be helpful if you wanted to create documentation using Documenter.jl? (I’m guessing as I haven’t used it yet)

1 Like

There is a long discussion about that here:

section:

Over-constraining argument types

of

https://www.oxinabox.net/2020/04/19/Julia-Antipatterns.html

1 Like

Let me prefix this with the statement that my “program” space is that a method is written to perform an operation on a specific set of data. It doesn’t need to be generic because it will never be called with different data types. (There are rare exception, and when I hit those exceptions I don’t declare the data types, or use the where syntax.)

I tend to follow a fairly “object oriented” design format, not sure if that’s just the nature of what I’m programming or because of how I was taught. That means that 90% of my methods take a structure as their first parameter and the method does some operation ON that structure.

Defining the type of at least that first parameter helps avoid name collisions where I define a method foo for two different structures and the operation of foo for those two structures is wildly different. If both of those foos take the same number of parameters but didn’t declare their type then Julia would have no choice but to destroy the first definition when the new definition is created.

So defining the types allows me to not worry about it. I could by default not define the types unless needed but then this gives my code an “inconsistent” feel where some methods have their types declared and some don’t. I personally like it when all my code smells the same, so I err on the safe side (with more typing) and define my parameter types.

If my programming “space” needed more generic functions that could handle different data types, then I might change my story. For me and what I’m doing declaring the types doesn’t cause harm and as @colintbowers said, helps me remember what the function was suppose to do months later.

1 Like

I overall agree with you. Yet, I have been in the situation of implementing a package to work with matrices, and passed slices of those matrices to inner functions to some computations. Then I found out that views where a good thing, and had to rewrite all function annotations to accept views as well. Then I found out that using StaticArrays was interesting, and had to revise everything to be sure that those would conform the function annotations as well. Then I realized that it could be interesting to use FieldVectors from StaticArrays… and again. So, even in my very narrow programming space, overconstraining the data types in function input has led me to do much more work than I could have done.

At that point I started to follow Lindon White advice whenever possible, and if the case make the code clearer by adding comments.

5 Likes

I see your pain, I’ve experienced different pain. :slight_smile: (In other languages) I’ve written classes with generic base classes for that “what if” scenario, if we ever needed to add another derived class we could reuse all this code…and then never seen all that beautiful code reused in the slightest.

So I guess I’ve become a bit jaded, unless I see it on the roadmap I don’t program for it. I think it really comes down to you can spend the time now, and risk it never being utilized or not spend the time time now and spend more(maybe) in the future making it more generic.

Granted generic methods and generic classes are a bit different, but you can still run into situations like: do I do for i = 1:length(a) or should I do for i in eachindex(a) just in case we want to pass a string or some other object that doesn’t have sequential indexes…

3 Likes

Ok, but literally takes less effort to not restrict the argument types at all, also it is very hard to find a situation in which just using for i in eachindex(a) without thinking is not the best course of action. Julia makes being generic take no more effort, or even less effort, than being specific, consequently the only disadvantage that I see in your approach with generic base classes (the extra effort), does not happen in the cases discussed (in Julia).

2 Likes