Kate Gregory “Naming is Hard: Let's Do Better” and what lesson we can learn from it

Kate Greogry talk Naming is Hard: Let’s Do Better is not specific about C++, Julia or any other language, but how to name things properly in any language. As currently working as teacher and leading exercise classes of introduction to programming in C and also going through tutorials about Julia, I constantly find myself agreeing with her, that naming is hard and we should work on making it better.

This isn’t a rant. It is just stating my position. If it read like rant, it means that I need to work on my English. :sweat_smile: I also won’t be talking about names in core Julia or specific packages, since this is about the way we learn and teach programming, not how core of particular language use names. What is equally important I don’t have enough knowledge of Julia core to speak about it.

I must stress that my perspective is of average programmer that teach beginners how to start with programming. It is convenient variables x and y, because it is less typing for me, but name of object should tell you something, not be just convenient to type. If x and y are coordinates they are fine, the same is true of using i just to interating for loop (we work in C, which it is even more true). But, recently I try to avoid things like x to name double or d1 to name dictionary. First, because I feel that I encourage bad practice on my part, second, because I don’t believe that it is a good practice for teaching (see again Gregory’s talk).

When I introduce variable of type int in C, just to show how integers in C (to stick to this concrete case) works I would rather type
int intVar = 0;
than what most materials that I saw has. Do my students understand that it is short-hand of integer variable? I hope so, but I should check it when we go back to class after the break. Still, I believe that it is worth, since it make it easier even for me. When I see 'intVar1` I immediately know that this variable exists only to show how integers work.

When learning Julia I will know change code like
d1 = Dict()
to
dict1 = Dict()
Again, more typing, but less overhead for me. I just read dict1 and I know that this variable is just for illustrating how dictionaries works.

I hope that you will find Gregory’s talk as worth watching as I did.

5 Likes

This is like the debate on the starting inxed of arrays, should be zero or one?

There are comunities (mathemsticians, economists,phisics…) that prefer 1-key var name (often using greek letters, character modifiers like tildes, bars, dots, hats…) and comunities where var names have a descriptive content and hence are longer… at the end there isn’t one better than the other, you just follow the style of the community where you are …

1 Like

Perhaps, but in the context of teaching, there is probably a right answer, or at least a wrong answer. I also think there is a technically right answer for what array indexing is best fit a given situation (which is, “it depends”), but the argument most people have on the Internet is much more aesthetic.

But I digress. In terms of variable names used in teaching programming, there is almost certainly a right answer that will depend on your learning objectives. In general, I doubt most programming courses have “tax working memory” as an objective, so having variables say what they are is probably a good heuristic.

But I have a lesson where I’m trying to teach the difference between values and variables that point to values, I use x and y because I want to show that I can reassign the label to lots of different things, show that scope matters etc. And ultimately I show off a very complicated bit of code where the meaning of x changes a bunch, and I want to convince students that they should usually use meaningful names for variables :sweat_smile:

So, context matters …

A different point of view from a great physicist:

You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing - that’s what counts. (I learned very early the difference between knowing the name of something and knowing something.)” - Richard Feynman

12 Likes

I’m happy to accept all of the decisions made by the creators of a programming language. It’s difficult enough to try and master everything it is able to do and that’s my goal. I can’t take the time to question the names and syntax that were chosen, there’s too much to be learned.

Every programmer should watch Kate Gregory’s talk(s). Bad naming has been a curse since the early days of programming and is one of the reasons why so many codes cannot be maintained or debugged any more.
Investing time and effort in properly naming things is well worth it (you’ll thank yourself later).

My personal interpretation/rule when I code in C++: the type of a variable indicates its nature (e.g. SymmetricMatrix) and its name indicates its function/role/what it represents in your code (e.g. lagrangian_hessian). Therefore, I don’t think repeating the type in the name is good practice (you have the type for that!)

So dict1 = Dict() should probably be something like phone_book = Dict() (you get the gist).

I don’t buy the whole “math symbol” naming style, even though it looks cool for small functions. A program is neither a scientific article nor a user manual. Besides, with larger screens and autocompletion nowadays, we don’t really have any excuse not to use longer descriptive names.

5 Likes

Not sure about that.

“It serves you right to suffer” is not only the name of one of my favorite blues album, but the only effective progression method I have witnessed in the programming area. I have tried to explain/show that naming things is the most important and difficult thing when you write programs to a lot of peoples (students, colleagues, …) and I think it never has any impact.

1 Like

The 1st programming language I learned was FORTRAN IV on an IBM mainframe. Since then I tend to use for integer variable names starting with characters i-n, often a single character name for loop variables. Other variable names normally have a maximum of 6 characters. Here and there I allow myself exceptions from these rules when it seems to be important for clarity.

I dropped the habit of names having capital letters only (of course).

1 Like

The hill in which I will die is that the biggest mistake made by the Julia founders was to not always suggest a ‘_’ between words in a function’s name but “only if it would be confusing otherwise”. I lose an inordinate amount of time because I cannot remember if a random programmer decided that some name was or was not confusing enough to merit an underline, and I am always wrong the first time.

13 Likes

I agree with your first point, but not the last. Long names make code less readable in many cases, in particular when you have complicated expressions with many symbols.

Furthermore, in mathematically oriented code, or in very generic code, the variables are often completely abstract, so the idea of ‘descriptive’ names becomes meaningless. Then x or a or sinθ can be excellent names.

11 Likes

The upside of this style is that it pressures you towards finding concise names. If your function names are long compound names, it indicates that you should refactor, or perhaps that you haven’t been working hard enough on your naming. It’s an exercise in discipline, imo.

Anyway, it’s perfectly fine to choose underscores in your own code, if you prefer.

Well said.

There are so many facets to naming:

  • In business logic etc it might make sense to stay close to the domain vocabulary.

  • In generic functions, almost no meaningful names might be possible at all, i.e., as the level of abstraction is very high and correspondingly few assumptions about an object can be made. This is quite common in Haskell:

    -- from Data.Foldable
    traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
    traverse_ f = foldr c (pure ())
        where c x k = f x *> k
    

    where f can be any function, x basically anything and mainly the abstract concepts captured by the type classes have proper names (Foldable, Applicable) as well as a precise mathematical meaning.

  • Tacit programming allows to write code without naming arguments at all. Not very common, but sometimes used in Haskell (sum = foldr (+) 0) or APL/J (compute average as +/ % #).
    For a similar reason, I like chains/pipes as not every intermediate result needs to be named:

    @chain df begin
        subset(:sex => x -> x .== "male")
        groupby(:pclass)
        combine(:age => mean)
    end
    

Overall, naming is important, but an understanding of the concepts behind is even more so. Further, finding the right concepts/abstractions is key for generic and reusable code, i.e., what name would you give to the * operator in Julia?

6 Likes

If we gonna go the road of “serving as an incentive to force yourself to do something better” (an argument I am not a great fan, sincerely) we could then argue that long names help the programmer to break down each line of code in smaller parts, as you cannot fit a lot of calls/operators in a line if the variable names take a good portion of it.

1 Like

That basically makes it impossible makes it harder to discern the mathematical structure of the expression, though. And hampers readability and comprehensibility greatly. Nothing is better for readability than conciseness.

At any rate, if your function has a compound name, underscore or not, at least consider some way to refactor.

1 Like

Good point, concise code is not just better, i.e., faster and easier to change, for exploration, but can also reveal structures that you might not have found otherwise (see here for an APL example).

I would disagree (at least partially) here. Conciseness is good when it’s vertical (the number of lines in a function), not necessary horizontal (the width of the lines).

A few lines randomly taken from a Fortran library:

if(plen)xp=xp+sign(alpha,dble(lspj))
rpu=max(bup-blp-alpha,0.D0)
if(alpha.le.rpu)then
 rpu=alpha
 lspj=lspj
else
 lspj=-lspj
endif

Very concise. Yet unreadable. The point of programming is not to write one-liners or code that is as compact as possible; among other things, it is to communicate with other programmers or a future self.

5 Likes

That’s not concise, that’s terse.

I am cheating a bit with my definition. By concise I mean both clear and brief, it’s the point where you strike the balance between brevity and clarity, so it’s the optimum, by definition.

Exactly where that optimum lies is of course a matter of debate. But I’m using that term to emphasize that there is a trade-off, and that verbosity simultaneously clarifies and obscures.

I think long lines often has a very negative effect on code readability. I’ve spent a lot of time recently improving code readability by shrinking names, exactly to make incomprehensibly long lines easy to read at a glance.

4 Likes

Despite having some useless lines, this code is mainly lacking context, i.e., within a context/domain where lspj, rpu etc have some meaning it would be perfectly fine. As a random part of a 500+ lines function it would not be very understandable.

1 Like

Then we simply expect different things from a program. For me, lspj will never be good enough a variable name, context or not.