How is it that new Julia programmers tend to abuse type annotations?

Besides the compiler inferring a variable as Any, are other types of type instabilities possible?

Morning. I am going to be polite too, it’s a minimun.

This Julia function "always knows everything " because it no needs any types info about args.

function savantA(x,y,z)
r=sqrt(x+yy)/z
println(" savantA:: x’ typ= ", typeof(x), " y’ typ= ", typeof(y), " z’ typ= ", typeof(z), " r’ typ= “, typeof(r))
println(” savantA:: result= ", r)
end #savant()
function savantB(x::Unsigned,y::Unsigned,z::Unsigned)
# This Julia function is efective because the compiler has some info about the expected arguments
r=sqrt(x+y
y)/z
println(" savantB:: x’ typ= ", typeof(x), " y’ typ= ", typeof(y), " z’ typ= ", typeof(z), " r’ typ= “, typeof(r))
println(” savantB:: result= ", r)
end #savant()

A “newcomer” thinks that Julia is a “savant” about functions ARGS and run in this way …

savantA(123456789123, 123456789123, 123456789123)
savantA:: x’ typ= Int64 y’ typ= Int64 z’ typ= Int64 r’ typ= Float64
savantA:: result= 0.017312366074935987

If the “newcomer” suspect that could be better “tell” to Julia, even though implicity, something more about args …

savantA(123456789123.0, 123456789123.0, 123456789123.0)
savantA:: x’ typ= Float64 y’ typ= Float64 z’ typ= Float64 r’ typ= Float64
savantA:: result= 1.00000000000405

BUT if the “newcomer” needs by the function args declaration to uses a ::Unsigned value

veamos=UInt128(123456789123)
savantA(veamos,veamos,veamos)
savantB(veamos,veamos,veamos)
savantA:: x’ typ= UInt128 y’ typ= UInt128 z’ typ= UInt128 r’ typ= Float64
savantA:: result= 1.00000000000405
savantB:: x’ typ= UInt128 y’ typ= UInt128 z’ typ= UInt128 r’ typ= Float64
savantB:: result= 1.00000000000405

could be better a Bigfloat Type for r, isn´t ?.

en fin …

About “instabilities”, because someone said that it is better not to specify Types in the function arguments. But at the same time Julia does not automatically handle “the best Type” to represent the inputs, neither the intermediate results of the operations nor even the final result. I understand that it is very difficult, and I think that is why it is better to use Types itś necesary.

Sure, any abstract type. Sometimes small Unions aren’t counted because the compiler can make branches that avoid almost all the type inference issues, but sometimes they’re counted because combinations of inferred Unions in multiple arguments can cascade to Any inferences if it’s not guarded against.

if you want the method to work on any input types. If you need your method to work on particular input types, annotation is recommended. Without type annotations narrowing down a method, we wouldn’t have multimethods and multiple dispatch, a core feature.

1 Like

" if you want the method to work on any input types.",
But it’s not really true that "work " with “Any input”. At best it works with a subset of the Types. That’s why many times the function returns a message “MethodError: no method matching…”.
We need functions that work fine with the dataset of our research.

Sure it does. The call will dispatch to the method successfully, and it will execute. If the method contains calls that don’t have implemented methods, those calls throw the MethodErrors, the method only causes it indirectly. But that’s just nitpicking details, you’re right that your method’s signature should try to match its callees. Sometimes it’s not so easy to do that because there aren’t neat supertypes for all your input types. For example, you can push! to all sorts of collections, and the supertype of all of them is Any (the function that calculates this escapes my memory for the moment), so 1 caller method like foo!(collection, new) = push!(collection, new) doesn’t have neat annotations.

To play the devil’s advocate, this is my opinion after regularly using Julia for around 2 years and not having spent enough time to learn the low level intricacies of type-inference:

Sometimes I want a function to only work on a specific type of variable. Even though leaving it generic could probably make it work on cases I’m not yet thinking about, I find restricting it makes it easier for me to remember what I expected the function to do when I come back to see that code after a few months. As a compromise, I try to prefer abstract types whenever they seem applicable.

A good example here could be that I have a function that was written thinking it would receive a Matrix as an input. In other parts of the code, functions were written expecting a DataFrame. Ideally, I would try to restrict the usage of different types of data structures, but things happen (laziness, distractions) and the code needs to work in a realistic time frame… Some of these cases would give rise to errors if the function was defined as

function myfunc(thing) (stuff...) end

And 3 months down the line I attempted to use it with a 3d Array or a DataFrame because it looked like I had already done the work, when it was actually initially written with a Matrix in mind.
In this case, I find that defining it as

function myfunc(thing::AbstractMatrix) (stuff...) end

Avoids some of the errors from using Matrix (like using memory references like with adjoint) but also lets me know that I have yet to write the version for 3d Array or DataFrame.

I’m not trying to imply it would have been impossible to have written a version of myfunc that would be generic enough to fit into all cases at once, but I find it easier to rely on my good friend multiple dispatch to help me decrease the amount of ATP I have to burn in my brain to solve the problem at hand.

Another thing is that I generally find it easier to solve type inference problems by restricting the argument usage right at the beginning than trying to figure out what barrier function to implement. I’m not particularly obsessed about max usability type range in my personal code but I imagine that for publicly available packages this needs to be a bigger concern.

7 Likes

For anyone not using VSCode, Tim Holy’s video showing how to use ProfileView.jl with Cthulu.jl to debug type inference problems is priceless:

It completely changed my workflow. The first time I used it, I was grappling with some extremely heavy code that became 10 times faster because I could finally figure out where I should be defining barrier functions in a huge chain of functions.

4 Likes

I don’t quite understand your argument examples here for two reasons.

First issue is just formatting. If you use triple backticks, ``` , around your code blocks, they will be formatted in a much nicer way (similar to other code posted in the thread).

Second issue, is that I can’t easily copy-paste this into a REPL to test the behavior myself. Would you mind cleaning up your examples so they’re easier to replicate?

1 Like

This thread is getting a little heated, and is straying from the original topic.

The original post was asking why people think argument types and redundant local-variable types matter for performance (they generally don’t). That is, about people making typing decisions based on flatly false assumptions about Julia’s compiler.

Now it’s turning into a general debate over the utility of typing (which borders on the age-old debate over static vs. dynamic languages) … which is a neverending cycle because there are tradeoffs and people with different tastes and with different priorities will come to different conclusions.

There is no dispute that static type declarations can be useful to ensure correctness, and for dispatch, and as a form of documentation. But there is also a tradeoff, because overly restricting the types — even if it is for perfectly valid reasons, like restricting your function to types you have personally tested — can also prevent the re-use and composition of your code. As long as you recognize the tradeoffs, you can come down on whatever side you are most comfortable with.

Many, many functions in the Julia Base library are untyped or loosely typed in order to support wider composability — this is also called duck typing. A classic example would be the sum function, which works with any iterable object that supports +, and as a result the sum function has been broadly applied across the Julia ecosystem. The tradeoff to composability is that trying to compose a type from one package with a function from an unrelated package, even if it conceptually makes sense, will sometimes break in unexpected ways, so you need to do careful testing when you combine packages in new ways.

28 Likes

I could imagine that this problem will be less of an issue when the language server finally displays the types in the code. The integration of Cthulhu.jl into VS Code is a big step.

Meanwhile, it could help if the documentation could provide alternatives that address the reasons why beginners annotate their functions. For example, if some beginners add types to their functions because it helps them understand their functions better, the documentation could encourage those people to write type comments similar to those used in Python 2.7.

1 Like

Perhaps that is because the title is somewhat loaded (“abuse”).

In any case, I think that regardless of where a new user would start out on the annotation spectrum (annotate everything with concrete types, eg coming from C, or annotate nothing, which is what one would do just from reading the first few chapters of the manual), a few thousands LOC later they will converge to the same nuanced understanding of the issue. So perhaps it does not really matter in the long run.

7 Likes

Agreed. For me, I think @Mason pretty much nailed the central topic a couple of days ago.

I’ve been using Julia since around 2018 and a lot of my earlier attempts at improving performance involved liberally applying type declarations all over the place. I had a vague understanding of type instabilities and that is was important, so it seemed like something that could only help. In fact, since those changes usually got rolled up with other changes, it did feel like it was probably doing something. I’m still amazed by how logarithmic performance can be: a poorly-written implementation of some f might take on the order of 1 s to compute, a couple of tweaks and you’re down to 100 ms, refactor it seemingly-smartly and you’re down to 10 ms and feeling good… post it to Discourse and suddenly somebody has it deep into microseconds territory.

10 Likes

Do you know that they think this is for performance? In my case people do it out of habit, and out of correctness reasons. If you want to think about what a particular function does, it helps to have an idea about what its arguments implement, and in particular, how they relate. This is powerful information that can reveal a lot about a function [1].

@stevengj

There is no dispute that static type declarations can be useful to ensure correctness, and for dispatch, and as a form of documentation. But there is also a tradeoff, because overly restricting the types — even if it is for perfectly valid reasons, like restricting your function to types you have personally tested — can also prevent the re-use and composition of your code. As long as you recognize the tradeoffs, you can come down on whatever side you are most comfortable with.

This is a perfect indication for me that the Julia type system is severely deficient. It forces a completely unnecessary tradeoff. You cannot specify information like “all methods of this function have to satisfy the following type constraints”, nor can you make the equivalent statements about types. E.g. you cannot express the idea that a type passed to your function needs to be iterable.

The map function should have signature map(Callable[T,T’], Iterable[T’]) → Iterable[T]. I really don’t want anyone to be able to create a method for map that does not have this signature.

The fact that the Julia type system does not allow these things, which many other type systems do, means people who want to do these things end up “misusing” it in the sense discussed here.

Of course, by not allowing people to express these things correctly, forcing this steep tradeoff between being completely concrete and nothing, we force massive composability onto the ecosystem. Whether the composition does correct things, or is debuggable, be damned. I would much prefer the situation that Julia would gain the ability to express these types of things. If ForwardDiff fails, then just open a bug / make a pull request and we’ll fix it.

Let’s be concret, @Mason suggested a stiuation where a package creates a string(::Sym) that returns a ::Sym rather than a String. But this will break everywhere where people actually reasonably assume that string returns a String! So yes, in cases where people don’t need the fact that string() returns a String, specifying this would limit composability, or rather, it would prevent a specific design pattern (after all, nothing stops you from writing a lazy_string(::Sym) function). But in a million other cases, it would simply catch errors, and properly encode the assumptions we make about how our code is called.

[1] https://home.ttic.edu/~dreyer/course/papers/wadler.pdf

1 Like

Strict interfaces and traits certainly would be great additions, but are not easily integrated with multiple dispatch, or we would have them already.

That said, I don’t think new Julia users abuse of type annotations because of that. It’s just for (the feeling of) safety, IMO. When I started coding in Julia, coming from Fortran, it felt really weird not to declare anything.

3 Likes

Well, there are at least two reasonable signatures that are subtly different:

  1. Haskell: fmap :: Functor f => (a -> b) -> f a -> f b
    I.e., the input and output “iterable” (or Functor) need to be of the same concrete type
  2. Scala: map[B](f: A => B): Iterable[B] method of Iterable[A]
    I.e., input and output need to be of trait Iterable, but can be of different concrete types.

In turn, this might effect/restrict what you can do afterwards, i.e., when using the output of map. In particular, consider some code such as

show . fmap (* 2) :: (Show (f b), Functor f, Num b) => f b -> String

Then, you know that this would work for [Int] as List is both Showable and a Functor. In Scala, you could not be sure as the iterable returned by the map instance of List might not be of the right trait to be Showable. The compiler would tell you, but the code might not be as generic as you think or unexpectedly fail to compile for some Iterable instance as its map method returns a weird type.

Obviously, in Julia you can never be sure … but, the code might unexpectedly work for combinations you (or anyone else) never thought about.

Uncertainty over whether in map(f, xs::A)::B B must be A is I think just a mistake in the design of Base: letting underspecified interfaces run wild.

If there are multiple different ideas for useful signatures, e.g. a lazy one vs an eager one or a container-preserving one vs a container-replacing one, each function author should make a single decision and clearly specify which one they chose. If users want a different interface, then there can be map, Map, fmap, Iterators.map, etc.

Yeah, that’s not good. Generic programming is so much simpler when we actually know what the interfaces are.

1 Like

* “many other type systems” in static languages. Julia is not a statically typed language. If you want every function call to be checked statically, you maybe don’t want a dynamically typed language at all. Which is fine!

For discussion of this kind of thing in Julia, see also Towards typed lambdas by carnaval · Pull Request #10269 · JuliaLang/julia · GitHub and @JeffBezanson’s thesis discussion of arrow types in chapter 4.

See also the many past discussions of things like Why doesn't Julia allow multiple inheritance? - #4 by andyferris and Interfaces for Abstract Types ¡ Issue #6975 ¡ JuliaLang/julia ¡ GitHub

1 Like

I don’t think it’s critical that the check happen statically. But it should happen one way or another.

As is, there’s no way for a Base function author to formally specify the desired properties in a way that method authors can check compliance even if they want to, let alone automatically. The experimental GitHub - Keno/InterfaceSpecs.jl: Playground for formal specifications of interfaces in Julia is a nice start on an implementation; other dynamic languages (Clojure, Racket) have made great progress in this area.

1 Like

Would also say that Haskell is right here, i.e., A and B must be the same type in this case.

Well, you also need to identify the right interfaces in the first place, i.e., Haskell is again right that fmap does not belong to Iterable or something, but is much more general, and then, also know it’s properties, i.e., what you can assume when using an interface. E.g., the “functor law” fmap id = id implies that fmap cannot reorder collection elements.
Designing good generic interfaces requires some time – Haskell had several iterations to settle Foldable, Traversable or Monad etc. Overall, many of Julia’s interfaces are already quite good and well thought out even if not formalized as in Haskell.