Naming: Remove all underscores to matter what?


Oh that makes me so happy! :smile:


Thanks so much for this. I think at times I struggle with understanding the changes that happen in Github, but this makes a lot more sense to me now. Looking at that list of possible imports is far less scary then I had made in my mind! I still want more _ but I feel less like the choices are being made in an arbitrary way just to remove spaces between words :slight_smile:


Removing underscores represents an aesthetic preference that, to the decision-makers, is more important than readability.

My approach is to use underscores except when there is a direct analogue in Base (like the is* functions).


I dispute whether this is really the case. I certainly agree that you wouldn’t want stuff like besselfunctionofthesecondkind(a, x), but there’s nothing in Base that looks like that, which is the whole point. Multiple dispatch is particularly conducive to short function names hence the huge library of methods in Base the vast majority of which are no more than 2 words. I agree that particularly verbose function names, when necessary, should have underscores.


I do not understand why the Julia core team decided to sacrifice readability for beauty. Beautiful code is important for me, but not to seperate words either with camel case or with underscores sacrifices readability. I don’t understand this decision. :cry:


OT X-ref for the interested: Run the half-life of code analysis on Julia?


I look at it differently. I don’t think it reflects a preference for mashing words together – instead it is a convention that encourages you to consider how you name identifiers in your code, and, more generally, how you organize the concepts you’re working with.

Strive to choose concise names, not too short, not too long. If you need underscores, it most likely means that you can work harder to find a better name, or perhaps that you are mixing together two concepts that should be separated.

It reminds me about the principle (which has been mentioned many times) of implementing as much of the Julia language in Julia itself, even when dropping down to C or something else would be easier and more efficient. It sounds like a potential handicap, Julia could be faster with more C code in it, but in the long run it’s better. Analogously, discouraging underscores forces you to work harder and do better.

I think that if you stick to the convention, and put in the extra work, your code will in the end be more readable.


I don’t think that ANY language can become more readable by ommiting spaces and by putting everything in one word. In science, you have subscripts and superscripts. You don’t write everything in one word. While programming, to have at least subscripts (by using an underscore) is essential for me. Nothing improves if you remove this degree of freedom.


I don’t think anyone forces you not to name your own functions however you like. I hope this helps?


But that’s the point. Don’t put everything in one word.

In science, you have single-letter identifiers, mostly. You need a lot of subscripts to make up for that.


I think Julia is really the wrong language to attack when it comes to degrees of freedom when writing code:


I don’t think anybody is saying that you shouldn’t add underscores to function names that you feel are too verbose; you can and should do so. The point is that the vast majority of what’s in Base are short names like isbits, randstring, eigval. This occurs precisely because Julia has multiple dispatch as a core paradigm. The argument is that these types of short names are better (and, in my opinion at least, more readable) if they stay short. If you have a variable with a subscript, by all means, add it as either a unicode subscript or an underscore. I’m sure there are a few cases where the devs got overzealous for the sake of consistency, but for the vast majority of what’s in Base I really don’t see readability as being an issue.


At risk of fanning the flames that I have started, but if the convention was to ALWAYS put underscore between variables that are composed of multiple words why would that not also lead people to have names composed of fewer words? if the names is find_shortest_path instead of findshortestpath are you saying that because the latter is far harder to read it leads to better names? Do people make API choices simply because they are scared of spaces and need to be cruel to end users?

Generally I do find that the core devs are careful with the names they pick, but as a rule the current naming conventions and taste for removing all underscores leads to a real adhoc mix of really nice short names, as well as mashed together names of varying readability, and ones with underscores. We are not just moving towards the great short names, but will ultimately have all 3 which I think is the worst outcome.


why is multiple dispatch related to this? I will bet for most of the names we are dealing with we likely have single dispatch and as such really are no different from most OO languages (which as a rule generally have longer names), as well as no real dispatch like languages such as matlab where many of these mashed together names come from. I would argue that mashing as a taste simply comes from the DOS like days when we had limited characters, not because lack of underscores makes anything less readable.

Most newer languages avoid this kind of artificial constraint (Java onward) from my experience. I think it is simply that people hate underscores as you would never have this issue with camelCase as people are happy to change the capitalization if they don’t have to add a character.


Because much of the functionality of the function is specified not only by its name but by the types used as input. The opposite extreme would be Python where function arguments are a free-for-all and you need incredibly verbose names. Intermediate ground would be Java or C++ where you have some dispatch and most functions belong to a class.


There’s nothing automatic about this, of course, but I at least take this convention as a way of forcing myself to think once or twice more, about how I can avoid long ugly compound names, without resorting to underscores.

Perhaps many of those just don’t have the best possible names yet?

I guess you can never have just short, beautiful expressive names, there will always be warts. But by choosing this convention, the devs are at least consciously pushing the language in a particular direction, and I think that the balance of advantage will ultimately prove to be on the positive side.


To clarify, we don’t no insist that everyone write Julia code without underscores. It is a discipline beautifully explained by DNF to which Base Julia strives to adhere in its public API. In internal functions and in my own code I use underscores all over the place because the factoring of the code doesn’t matter so much. Not all code needs to be reduced to its most atomic elements of design, but that’s what we aim for in the Base language.


Okay my last post as these kinds of things are always a pit. It has been very helpful for me to understand how people see this, even if I fully disagree. I don’t want to just end with the impression that I have not found this discussion interesting!

The thing I really can not understand is how removing underscores is seen as somehow making the variables more “atomic” / “minimal” or pure. I see it simply syntactically, I see no “purity” change between is_dense isdense or isDense they are simply different ways of separating words in a variable name.

If you move from is_dense to isdense you have not found a more atomic expression of your code, you simply removed an underscore. If you need to use an underscore, don’t cheat, use it. If you somehow find a more atomic name great! You have found real beauty. Base has done this latter kind of reduction many times, but I think we are getting into a state were we are cheating instead, separate words should have underscores, removing them does not magically get rid of the need (because you are combining distinct words).


You can do things like multiplying, adding and dividing matrices and vectors without using LinearAlgebra. The things that are exported from LinearAlgebra and not available in Base are things like special matrix types, factorization types, and specialized linear algebra operations like axpy!. Really basic linear algebra operations still don’t require any imports.


A while ago, I made this list of lowercase exports split into words where they potentially could be:

The thought was maybe we could standardize on “using underscores consistently”. The trouble with this turns out to be that there are many very well established function names – especially in math– that do not have underscores. This leaves three options:

  1. Use underscores consistently to separate words policy, but be the only language around that spells abspath as abs_path, dirname as dir_name, basename as base_name, lcfirst as lc_first, linspace as lin_space, logdet as log_det, logabsdet as log_abs_det, nullspace as null_space, ordshur as ord_shur, pinv as p_inv, randn as rand_n, readline as read_line, repmat as rep_mat, sortrows as sort_rows, etc. What about acosd? Should that be spelled a_cos_d? If not, then why not? What exactly constitutes a word?

  2. Use underscores except when there’s a strong legacy reason for not doing so. However, this is a pretty subjective judgment call, and leaves the language with a wildly inconsistent experience where – much like English spelling – you have to know the provenance of a name in order to guess how to spell it.

  3. Avoid underscores and try to factor APIs as much as possible so they aren’t missed. This is what we do.

I think it’s easy to focus on a few ugly examples of the “no underscores” approach that are not great (e.g. nbytesavailable), but it’s much harder to actually come up with a consistent policy that doesn’t lead to even more absurd cases (criticism is easier than design). Mathematica takes the approach of spelling out absolutely everything, which is very predictable and very characteristic. The Mathematica-esque names of some of the above names would be something like DirectoryName, FileBaseName, LinearSpace, RepeatMatrix, SortRows and so on. That’s a consistent policy, and a respectable choice, but being equally explicit and verbose (with underscores or camelcase) would lead to a very different feeling language than Julia.