How is it that new Julia programmers tend to abuse type annotations?

lrnv · January 7, 2024, 1:40pm

Spending some time on this forum and reading around, this strikes me as a recurring mistake of newcomers (me included some time ago of course!):

Superfluous type annotations on parameters on function: it looks like people have trouble to understand that these only restrict the applicability of the method and do not contribute to performance.
Type annotations on a given variable to force its type : these can even be harmful and hide type instability behind a conversion, my advice would be to never use them.

Is there a reason for this common mistake ? Maybe other languages have different behavior that people wrongly transpose to Julia ? Or is my appreciation of the frequency of the issue somewhat biased ?

stevengj · January 7, 2024, 1:47pm

I think that it’s mostly a habit from other languages — they are taught that type declarations = static typing = fast, and untyped variables = dynamic typing = slow. Automated type inference and specialization are more exotic concepts (if they are available at all) in many popular languages, whereas in Julia they are central.

lrnv · January 7, 2024, 1:48pm

So maybe I was not exposed to these languages before coming to Julia hence my incomprehension. Could you give me a ref on some other language requiring type annotations for performance reasons?

stevengj · January 7, 2024, 2:01pm

There are the classic statically typed languages like C and C++ and Fortran and Java where people generally declare everything (C++ nowadays has auto and Java 10 gained local type inference, but I think these techniques are not usually what people first learn, and anyways there are still lots of cases where you need to declare types explicitly). In Python people are pushing towards adding static type hints (it doesn’t yet affect performance AFAIK, though many people seem to think it will), but there are also extensions like Cython that require type annotations for performance. Coming from these sorts of languages, type annotations make code “look faster”.

(Yes, there are lots of static languages like Haskell and Rust that have more extensive type inference, but I don’t think the majority of people trained in scientific computing are coming from those languages. It’s more rare to have a dynamic language designed for intensive type-inference optimization, I think, so I suspect that people instinctively put type declarations to make the code look more “static”.)

Put another way, Julia’s performance model is a bit different from either a traditional statically typed language or mainstream dynamic languages, and it can take a while to calibrate your coding style for this, and to know where you can trust the compiler to help you. It’s pretty natural to (a) over-type code because you don’t trust the compiler enough, and (b) bend over backwards to express code in terms of stdlib functions because other dynamic languages train you that “builtin functions are fast, user code is slow”.

joa-quim · January 7, 2024, 2:21pm

Well, I don’t see them as superfluous but as an @assert

JET doesn’t stop surprising me. For example if this_number is of type Any, than string(this_number) is also Any. So I have to do
string(this_number)::String

stevengj · January 7, 2024, 2:32pm

Yes, but it’s a common mistake to over-type in ways that reduce the generality of the code, and it can also make code less readable to have lots of redundant type declarations everywhere.

See also the manual’s discussion of Argument-Type Declarations.

Once you have an abstractly typed variable, it shouldn’t be too surprising that it tends to defeat type inference downstream as well. (But once you are willing to have an abstractly typed variable, do you care?)

joa-quim · January 7, 2024, 2:36pm

I sure do because those Any's propagate and they were Any in first place because I couldn’t avoid it. The situation normally arises from parsing user input where options can be passed in a variety of types that land in a Dict{String, Any}.

lrnv · January 7, 2024, 2:44pm

This is not the situation I’m talikng about : you are talking about typing outside data once and for all when you get it, I am talking about typing variables in the middle of functions, even in the middle of loops, in the --false-- hope of performance gains.

stevengj · January 7, 2024, 2:45pm

Yes, but in most circumstances a function barrier is sufficient. That is, if you call f(x) on an x that was not inferred, it still dynamically dispatches to f compiled for the concrete runtime type. As long as f is sufficiently expensive, so that the dynamic-dispatch overhead is negligible, it will run quickly. (Yes, the result of f will then be boxed, so the process will repeat on the next call. It does happen that sometimes you want a typeasser to prevernt dynamic dispatch on subsequent calls, but I think the need for this is relatively rare.)

nsajko · January 7, 2024, 4:02pm

In general it doesn’t hurt to put a type assertion when you need/expect some type somewhere. Xref Require constructors and `convert` to return objects of stated type? · Issue #42372 · JuliaLang/julia · GitHub

joa-quim · January 7, 2024, 4:32pm

I think I brought this some years ago but it still can’t get it why string(something) is not guarantied to return a String (and same for FloatXX(...) and others.). And it’s not me saying that, it’s JET

nsajko · January 7, 2024, 5:20pm

Probably you’re using some package (possibly indirectly), that defines an (imperfect) method for Base.string. Hard to tell without more information, but I can tell you this: when starting a fresh Julia REPL, for all of Julia v1.9, v1.10 and nightly (future v1.11), all methods infer as either String or AbstractString. This is the command: Base.return_types(string, Tuple{Any}) (the entries correspond to the output of methods).

jkopper · January 7, 2024, 5:24pm

I feel compelled to chime in here as a newer user that has made (and continues to make) this error. The reason is that with the other languages I know, it is never bad to add a type annotation. Indeed it is often required by the compiler. Frankly, I am baffled by the notion that giving the compiler more information could ever be bad

John_Gibson · January 7, 2024, 5:30pm

julia> f(x) = 2x  # no type annotation....
f (generic function with 1 method)

julia> f(3)       # good
6

julia> f(3+2im)   # good
6 + 4im

julia> g(x::Int64) = 2x  # with type annotation
g (generic function with 1 method)

julia> g(3)       # good
6

julia> g(3+2im)   # bad!
ERROR: MethodError: no method matching g(::Complex{Int64})

Closest candidates are:
  g(::Int64)
   @ Main REPL[85]:1

Stacktrace:
 [1] top-level scope
   @ REPL[87]:1

jkopper · January 7, 2024, 5:37pm

I suppose I disagree that the behavior of the function g in your example is “bad.” If I wanted a function that worked for another type, I would annotate it that way, and I feel like if I want something to work for every numerical type, then I should annotate it with an abstract type.

I’m of course not arguing that my approach is correct or optimal, just that it’s surprising that it isnt

John_Gibson · January 7, 2024, 5:45pm

The argument is that, for many mathematical functions, the same sequence of operations produces the desire result for inputs of many different types. So when possible, it’s best to leave off the type annotation.

A practical example is autodifferentiation using ForwardDiff, which works by plugging a Partial numeric type ( dual number consisting of a float plus an abstract epsilon) into user-defined functions. ForwardDiff would work on the above f(x) as is, but not on g(x).

mkitti · January 7, 2024, 5:46pm

Note that types are sometimes not about performance but rather correctness and program validation. Also I would not call these “annotations”. They are very much effective in the language.

The subtle difference between return type assertion and local variable assertion is unfortunate, but important to understand.

julia> f(x)::Int = x # coerce x to being an Int, error if exact conversion is not possible.
f (generic function with 1 method)

julia> f(3.5)
ERROR: InexactError: Int64(3.5)

julia> f(3.0)
3

julia> g(x) = x::Int # I believe x should be an `Int`, error if that is not the case
g (generic function with 1 method)

julia> g(3.5)
ERROR: TypeError: in typeassert, expected Int64, got a value of type Float64

julia> g(3.0)
ERROR: TypeError: in typeassert, expected Int64, got a value of type Float64

That said, people often use types incorrectly in Julia as well. For example, there are well known issues with AbstractArray possible not have its firstindex as 1.

function myfirst(A::AbstractArray)
    isempty(A) ? throw(ArgumentError("Array cannot be empty")) : A[1]
end

On superfluous type annotations, I would not discourage this directly if used correctly, especially with concrete types. There are good reasons other than performance, even possibly detrimental to performance, to use these in certain circumstances.
Type assertions on a variable can very useful especially in the global context in recent versions of Julia.

Honestly, I think we should carefully re-evaluate what best practices are given certain needs and criteria. Performance first and only is not everyone’s objective. Rather consider the needs of someone who actually wants a program of limited scope with provable and correct functionality.

joa-quim · January 7, 2024, 5:59pm

Agree.

Mason · January 7, 2024, 6:00pm

There actually is an optimization which does this, it’s known as “world splitting” (which is different from union splitting!). What world splitting does is it looks at a call like f(::Any) and then it looks at the methods of f. If f does not have very many methods, and they all return the same (or similar) types, then the optimizer will use that information to optimize the call.

Here’s a demo of it in action:

f(::Int) = 1;
f(::Float64) = 2;
f(::Complex) = 3;

and then

julia> Core.Compiler.return_type(f, Tuple{Any})
Int64

Voila! Unfortunately, now with the benefit of hindsight, this optimization is often seen as a kinda bad idea. The problem is that the result of this optimization depends strongly on non-local information, and can suddenly fail and de-optimize if someone somewhere else defines a method.

Basically, suppose someone has code that does something like string(::Any), and the compiler is able to infer that into string(::Any)::String. Now suppose you compiled code that relies on this result.

Problems can then occur if you load a package which adds a method to string that doesn’t return a string. Suppose it was a symbolic package and they defined some lazy-symbolic thing such that string(::Sym)::Sym or something. Now suddenly your compiled code is invalid and needs to be re-compiled before it can be used.

Situations like this may seem rare, but it’s actually been a quite big source of hard-to-fix latency bugs out there in a lot of julia packages.

However, there has recently been some talk about creating a way to statically guarentee that a function like string always returns a String or that constructors FloatXX always return an object of that type, this’d be nice, but doesn’t exist yet.

jkopper · January 7, 2024, 6:05pm

This is a revelation to me and a little bit shocking. I had assumed it worked exactly the opposite way: that your function f would error on float inputs and your function g would try to coerce the return value to an integer!

Perhaps this confusion helps answer the original question that started the thread.