Is a simple, beginner style with named parameters and no unnecessary type annotations acceptable?

I am trying to write the simplest code possible for users with little programming experience. I don’t mind leaving room for future optimization, but I would hate to teach terrible programming patterns for the medium-term. My general question is whether writing code without any types (unless necessary for multiple dispatch) and using named tuples as the parameters is “good enough” for most performance needs and is likely to be type stable in the v1.0 timeframe. To be specific, can anything tell me if the following code would be just as type stable and performant with and without types (post v1.0 and named tuples)

using NamedTuples #Will drop in move to v0.7
using Distributions
function mytest(v)   

    f(x, p) = p.a * pdf(p.dist,x) + p.b;

    d = Normal(0,v)
    params = @NT(a = 2.0, b = 2, dist = d)
    x = 0.1
    y = f(x,params)
    println("output = $y")
end
mytest(0.2)

(and, yes, I know about @kwargs which is cool but would prefer to keep the code as simple as possible)

1 Like

Yeah, that’s a good long-term pattern to use which is both performant and slick.

3 Likes

Omitted type specification is the same as specifying Any. Unless you use dispatch, it shouldn’t matter. It is the same as specifying that an argument is Real, but the methods will be compiled for the various concrete types.

1 Like

Teaching a class of grad students with heterogeneous prior programming experience, I found that

  1. structs were OK and well-liked for organizing values that belong together, with the only drawback that you have to restart if you want to redefine (unless working in a module),
  2. dispatch on types (w/o parameters) was more difficult initially, but understood with a bit of work,
  3. type parameters are too advanced for an intro course.

I consider named tuples an “anonymous type” (I can’t wait for v0.7, I plan to re-organize so much code around them), and I think that they could be a sweet spot for efficient yet clean code.

I think that if you have to compromise between performance and clarity in an intro course, go for clarity. It is OK to leave 2–10x speed gains on the table, they can learn about it later on.

3 Likes

Moreover, it is unclear if type parameters (and even variable declarations) would even speed up the code for most library users? Inference seems pretty solid, and users unfamiliar with generic programming may end up choosing the wrong types and break things like auto differentiation, call a suboptimal version of a function with multi dispatch, force something better off lazy to be calculated. etc. Am I wrong in this worry?

The thing I don’t quite understand is what happens if they declare structs without types…is that a bad habit to get into (especially in a world with named parameters)?

Except when needed to distinguish between different methods of a function, or when the code will only work with a specific type or abstract type (here, I’d always go with the highest type up the tree that that the code would work for, i.e. things like usually things like: AbstractString, AbstractFloat, Real), I never add types to the parameters. Julia will generate better (or at least the same) code without them.

structs without types - that can be a big performance killer.
If you don’t want to show them how types can be parameterized yet, then may it would be better to teach them to put ::Any, instead of nothing, and explain to them how that will not perform as well, but is good enough for starting out. Having the ::Any will make it much easy to search for those fields later, when you want to optimize them.

1 Like

The type is Any. Yes, currently that is a bad habit if you want performant code. But keep in mind that there is a trade-off between generating code for various type combinations, and compilation time.

As I said, I would ignore this whole issue for an intro course. Some things will be fast, somethings won’t be as fast as they could be, but will still be fast. I assume that in class, you would be solving simple problems, and compilation time would dominate or at least be significant.

Julia is both

  1. a language with convenient and clear structure, a pleasure to program in,
  2. a language that can be optimized to squeeze every last bit of performance out of the CPU.

Discussions frequently focus on (2). But it is perfectly fine to emphasize (1) in an intro course.

3 Likes

I would not use something which is outdated soon and, more importantly, something which will not be used in other learning materials (textbooks on Julia, Stackoverflow questions and answers, examples in the documentation of Base, etc.)

AFAICT they will become part of the language soon (with a slightly different syntax). I also expect that their use will proliferate rapidly, especially in one-off code.

“Outdated” refers to the syntax and the use of a package for this. Named tuples are not outdated of course, they only become a part of the language soonish.

Right, so there is nothing wrong with using it now. All you would have to do to update it for 1.0 is remove @NT everywhere.

Perhaps my opinion is borne of a time when I transitioned from C++ to Python, but I’ve always considered type annotations to be a luxury rather than a burden. Even forgetting about multiple dispatch for a moment, not only do type annotations catch baffling Python-style errors resulting from trying to execute code on a type it was not intended for, but, perhaps more importantly in this case, they tell people who are reading the code what they are looking at. (e.g. if you had written mytest(σ::Real) everyone would probably guess the meaning of the argument before ever reading the code.)

I don’t understand this idea that code is somehow simpler if the programmer pretends that all data types are equivalent. Even if you are teaching people to program, one of the very first concepts that they need to learn is that a Float64 a UInt64 and a String are all different things. The ability to omit type annotations is certainly very nice as an alternative to writing something horrible like Union{T,AbstractVector{T},Union{T,Missing}} where T<:Integer or during certain prototyping processes, but usually type annotations are a convenient aide for the programmer and perhaps I dare even say a good teaching aide.

Sorry, I’m probably taking this a bit too seriously, your simple example seems reasonable, I just couldn’t resist making this point.

3 Likes

But you were probably a more experienced programmer at that time than the “users with little programming experience” in @jlperla’s course.

Julia has a ton of elegant and useful features. Their utility is not in question. The problem is how to simplify them in the first pass — just like universities teach Calculus I to freshmen, not Advanced Measure Theory IV. (I made that course up, but it sounds cool. Only saw II in the wild).

You are raising an important point, but put yourself in the mind of a new user who has never programmed before (or at best done a little matlab).

Types, excessive use of structs with constructors, generic parameters, etc are syntactic noise making the algorithms look much more complicated than it needs to be. Of course, none of this applies to writing libraries where careful typing is essential

Also, I think in the C++ world (and certainly in things like haskell) there has been a trend to let inference do its magic wherever possible

I agree that data structures are probably not a day 1 concept, I was only arguing for type annotations. I’m not sure of the context, but I’d think that at least math, physics and engineering students would immediately appreciate the difference between scalars vectors and matrices whether they’ve programmed before or not. They’d probably also immediately appreciate concepts such as, e.g. non-negative integers or the difference between real numbers and text.

I can’t help myself.

I believe teaching with explicit types for functions is just good practice as it serves as a way to annotate the code. This function takes this kind of arguments because… (e.g., it has to be a real number in order to compute the arithmetic mean).

1 Like

The problem I have seen here is that they may choose a much more concrete type than they really need, or too generic of a type (which is a problem for collections of nonconcrete things).

In the first case, the too specific case can make it so that auto differentiation, etc cannot be applied, and the too generic with collections can have horrific performance.

Types are great once you know what you are doing

2 Likes

The difference is really between the “teach students Java so they are explicit and know what’s going on”, vs “teach students Python so they can get things going quicker and get to the real meat of the problem”. In most universities, the switch to Python has occurred in the CS100 course. Proof by majority, but I assume that the people who made the decision did so quite consciously.

While I enjoy having had the experience of statically-typed languages, when learning algorithm theory, it was not necessary. Additionally, in a mathematics or scientific course, teaching programming gets in the way of the actual science. The ideal would be programming to be so easy that it’s never mentioned and just used as a tool. In that case, the easiest situation is probably the best.

5 Likes

The ironic thing is that type annotations actually seem more meaningful for the math and science context than they do for the algorithms context. If you are learning to write quick sort, you might think type annotations are silly because of course they are integers why would they not be integers let’s not worry about this now. In the math or science context however the distinction seems more relevant, especially because of the distinction between scalars and tensors. This isn’t programming getting in the way of science, it’s programming acting as a concrete instantiation of a mathematical abstraction. I’d argue much the same for things like various types of integers and strings. What is not important at this point is, e.g. the distinction between Float64 and Float32, but that’s not really what we’re talking about.

As for Python, when I first got introduced to it, I kept hearing people say things like “you don’t even have to worry about the types, it’s so great you just write natural code”. This is just silly. Of course you have to worry about types especially since when using Python you are immediately confronted with numpy objects or some other abstraction that really lives in C code. It didn’t take me very long to start getting inscrutable errors resulting from me passing some numpy array of the wrong rank or some other such silly thing (I was particularly vulnerable to some of the pitfalls as I was coming from C++). Suffice it to say when I started using Python I was not in the slightest bit impressed by the fact that every reference to data types is annoyingly hidden.

1 Like