Kate Gregory “Naming is Hard: Let's Do Better” and what lesson we can learn from it

It doesn’t seem like a good name, agreed, but with no context, it’s hard to be sure. Domain knowledge may change the calculus.

For example, SNR is a far better name than signal_to_noise_ratio for those working in a domain. Even in plain spoken language, everyone I know say “ess-enn-are” because it’s a better name.

5 Likes

FYI, this goes by the name of Hungarian notation – that should help you search for opinions for/against :slight_smile:

PS: Interesting meta-comment on non-descriptive names :upside_down_face:

3 Likes

I can still remember first reading programming tutorials and being so utterly confused about why everything was MyClass or $my_var when there was also self.x and/or this->y magic. Like who is me, myself, and that? Why are we all so special!? Who is talking to who and what is on second base? Of course, folks were trying to helpfully show that these “my” things were the arbitrary names, but that just flew right over my head.

It’s so easy to see now what is “magic” and what are stand-ins that it can be hard to realize/remember how those first exposures to arbitrary exemplar names read and feel. For example, I’d argue that dict1 = Dict() is perhaps worst of all worlds, for exactly that reason — why is it dict1 and not dict or even Dict? It’s so hard to come up with good simple examples that aren’t so trivial they lose all meaning, but that’s where learning happens. The very best examples aren’t even generic, but are something that your audience cares about — so even better than phone_book = Dict("Jenny"=>"867…") might be a ticker_lookup = Dict("MSFT" => "Microsoft") or chemical_compounds = Dict("water" => "H₂O").

8 Likes

why is it dict1 and not dict or even Dict

I always thought the opposite, dict1 looked reasonably non-specific that I’d understand that’s probably arbitrary. Especially if there’s also a dict2 = Dict(). But dict = Dict() would have confused me into thinking it would always have to be a pair of something = Something().

I used to research eye movements, there are probably papers about how experienced / inexperienced coders scan code. For example, to understand what a function does you kind of have to jump to the arguments first, so that you know which names in the function body are essentially arbitrary, and which reference external things.

3 Likes

My strategy for variable names within functions is to keep said functions super-simple so that it does not matter. Whenever possible, a function should Do One Thing :tm:, be short, and easy do overview. This is not always possible but it is an ideal to strive for, then d1 or dict1 or x or whatever is fine, it is “local”.

Function and type names require more thought. But I don’t let this break the flow. I can always rename when I refactor, which happens anyway.

Sure, striving for good naming is a good habit, but my experience is that it is almost impossible ex ante for a nontrivial project. Just be ready to refactor often and review the code before a major release.

That said, I feel sorry for the people who use C++. So many things to name, all those methods and classes and whatnot.

9 Likes

To me, the biggest problem with dict1 in a tutorial is that it makes me think that there might be a dict2 or a dict3 or even a dictN — or, heaven forbid, eval(parse("dict$i")). I’ve seen so many cases where folks get tied in knots trying to use numbered variables as arrays.

Of course, everyone’s first impression will be different based on their own histories — but one of the main points in the talk here is about the importance of good names in both tutorials and pro real-world code, because people learn from both and strive to become the latter.

3 Likes

Oh yes I also remember asking in some VBA forums “how do I make 100 variables” x1 to x100 many years ago. But other languages don’t make arrays as comfortable as Julia. One of my most hated things to learn was the classic for loop incantation with var i = 0; i<=100; i++ or even Julia’s for i = 1:100 because it puns on =. I find for i in 1:100 so much clearer, I’m glad we have that.

4 Likes

@kevbonham hit the nail on the head with “it depends” because “context matters”.

I needed to code up Planck’s Law a while back, so here are three versions of the same function.

# case 1
spectral_energy_density(frequency, temperature) = 2*plancks_constant*frequency^2/speed_of_light^2 * 1/(exp((plancks_constant*frequency)/(boltzmann_constant*temperature)) - 1)
#case 2
B(freq, temp) = 2*h*freq^2/c^2 * 1/(exp((h*freq)/(k*temp)) - 1)
# case 3
B(ν, T) = 2*h*ν^2/c^2 * 1/(exp((h*ν)/(k*T)) - 1)

Case 1 comes in at a whopping 175 characters. For reference the SciML style guide suggests a 92-character line limit. Sure, it’s easier to know what the variables are, but I think it’s the hardest to tell what the math is doing because the line is so long.

Case 2 assumes that readers know what the constants mean, which seems valid otherwise why are they looking at a Planck function code? Besides, if one sets

const c = 3e8 # speed of light in m/s

above then there is no need for the variable to be so explicit.

But I prefer case 3, it has unicode, which some don’t like, but it is immediately obvious how the variables relate to the equation on the wiki page. But also notice that I didn’t try to go crazy and add the “proper” subscripts!

One theme in all three though is that each variable, regardless of length, has a specific, well-known meaning. The shortened and unicode versions tie directly to well-known constants, or terms/symbols/letters used widely in physics.

15 Likes

A name gets its meaning in relation to others. The relationship between names is determined from the observed behaviour of objects for which names have been given in the past. The name of the bird, thus, is like a bridge over which knowledge can be absorbed as the network of relations is already present.

Even in physics, the naming is essential to make correct associations in a paper or internal report. Once, as a young student reading a paper, I had a hard time understanding what the author meant with something like \chi^i_i where i was an imaginary unit rather than index confusing after a course on tensors.

This is starting to wander into the realms of philosophy and metaphysics. To quote Hobbes:

And from Mill’s Of Names:

Since, however, the introduction of a new technical language […] is extremely difficult to effect and would not be free from inconvenience even if effected, the problem for the philosopher, and one of the most difficult which he has to resolve, is, in retaining the existing phraseology, how best to alleviate its imperfections. […] And the question of most nicety is how to give this fixed connotation to a name with the least possible change in the objects which the name is habitually employed to denote, with the least possible disarrangement, either by adding or subtraction, of the group of objects which, in however imperfect a manner, it serves to circumscribe and hold together, and with the least vitiation of the truth of any propositions which are commonly received as true.

If you’ll google “theory of names” you’ll find lots of other light reading on this topic from our friends in Philosophy departments.

12 Likes

Fair enough! It looks like you’re lucky enough to work in a field where notations are pretty much universal and unchanging :slight_smile:
I work in (nonlinear) optimization and there’s as many different notations as authors, hence the need for unambiguous names in the code.

1 Like

For me, it isn’t one or the other. When I’m coding equations from a journal article, I try to follow the nomenclature of the article (often 1, 2 or 3 letter combinations often in unicode ( :heart: this feature in Julia)). This makes it easy for readers of the code and the article to go back and forth. When I’m coding more general code, descriptive names clarify the code’s intent and seem like a better choice.

9 Likes

I definitely agree that naming is important!!! I want to take the opportunity to shamelessly advertise block 2 of the Good Scientific Workshop that places strong emphasis on naming things for clarity and also have most of the advice in Kate Gregory’s talk (however most examples are in Julia in my workshop):

3 Likes

Luckily most of the heavy mathematics I do has straightforward notation (hence the Planck’s Law example). But I do have to spell out a lot of the peripheral stuff for myself; the current file I’m in has variables like image_median, metadata, and label_properties, which I do like more than any shortened versioned.

I think someone mentioned this above, or maybe its in the SciML style guide, but I just try to be consistent (I always label file paths filepath). No one else reads my code, so this is all just so that I can I was doing six months ago. Though even clear labeling doesn’t always help with that…

sure, as long as you don’t go to the seminar and ask questions :wink:

5 Likes