Word separation can be indicated by underscores (‘_’), but use of underscores is discouraged unless the name would be hard to read otherwise.
Okay, so which names are hard to read to you?
leftposition
hasleftposition
mappingquality
hasmappingquality
refname
hasrefname
nextrefname
hasnextrefname
These are/will be names of accessor functions of a type in Bio.jl. I think this convention prevents us from using consistent names in many places. Why is using underscores discouraged? What is the problem of inserting underscores between words like left_position?
If there are a lot of long compound terms in your API, I would just use underscores consistently between the words. In Base Julia, we try to expose only “atomic” concepts such that there isn’t a need for long compound terms, but that’s an unreasonable requirement for the broader ecosystem.
I think all those read fine, to me. Occasionally the English language throws up unfortunate combinations of words that we scan incorrectly. For example, is hasheight an octal hash function, or an elevation-related property? That one needs a separator. (Lisp users are fortunate in being able to use hyphens.)
You used to be able to find unfortunate website names, but it’s hard to say whether these were actually urban legends. For example, were therapistfinder.com, or whorepresents.com really registered domain names?
I not so consistently use the heuristic that if it’s 3+ words, it’s probably a good idea to start using _. I don’t know if that’s good or not, nor do I use it consistantly myself, but in general I think it’s a good way to distinguish between has_x is definitely overkill, while hassomekindofx needs some _'s in there. Take that as you wish.
Thank you. There seems to be no consensus on when we start inserting underscores between words. I don’t believe this is a good situation for both developers and users. Developers always think about naming functions they define. Users may encounter difficulties to accept names defiend by some developers.
I want to know the rationale behind this (odd, at least, to me) naming convention while some latest languages (e.g. Rust) and libraries (e.g. dplyr, tidyr) use underscores consistently.
In an effort to investigate using underscores “consistently” (as we’ll see, this is hard to pin down), I made a list of all lowercase exports from Base:
It splits every lowercase base export wherever you could possibly argue that there’s a word break. The first observation is that if we used underscores between all of these, it would be pretty awful. If we don’t go all the way and use underscores everywhere, then we start having to draw a line between what gets underscores and what doesn’t. Depending on how we do that, it would throw away the familiarity that many of these names have since they come from C, Matlab, Python, etc. which arguably makes things harder to remember the spelling of, rather than easier. The current policy of avoiding underscores altogether goes in the complete opposite direction, but it does avoid the difficult and subjective task of deciding what separations get underscores and which don’t. Instead, it forces us to try to keep all the names as short and atomic as we can so that they’re readable.
On the other hand, most user code bases are not like a language standard library:
They don’t have well-known, familiar names that are derived from other systems.
They don’t have as strong a need to expose only atomic and indivisible concepts.
This is why I’ve suggested that Base continue to avoid underscores, while packages are encouraged to use underscores consistently if they contain lots of longish compound terms – as your library seems to.
Note that Jeffrey Sarnoff started the JuliaPraxis organisation specifically for discussing topics like this https://github.com/JuliaPraxis
There is a big collaborative document on naming practices https://github.com/JuliaPraxis/Naming with a discussion of underscores and such as well. All the advice there is based on broad discussions on the dedicated gitter channel JuliaPraxis/Naming - Gitter
I am not saying everything there is right, or should be adhered to by people, simply that it is a good place to store and discuss these issues so the outcome of the discussions will also be useful in the future.
My personal view on this is that using underscore is quite a pain, as it hurt my typing speed. I much prefer to use capital letter to 2 and + words.
e.g.: leftPosition hasMappingQuality
I know this is also “against” the Stylistic conventions, but I find it both faster to type and more concise. The most important I guess is to not use a capital letter at the beginning like HasMappingQuality since it would conflict against Modules and Types conventions.
Thank you for your detailed clarification. Now I understand the current naming is reasonable and better than I thought. [quote=“StefanKarpinski, post:8, topic:2504”]
On the other hand, most user code bases are not like a language standard library:
They don’t have well-known, familiar names that are derived from other systems.
They don’t have as strong a need to expose only atomic and indivisible concepts.
This is why I’ve suggested that Base continue to avoid underscores, while packages are encouraged to use underscores consistently if they contain lots of longish compound terms – as your library seems to.
[/quote]
If this is will be a consensus of the community I think it should be described somewhere in the manual. I stuck to the underscore convention and I struggled to find a way to go with it.
The JuliaPraxis project looks helpful. I hope this kind of guidelines are accepted by the community.
for me hasmappingquality is hard to read because my brain tricks me into thinking it’s hasHmappingquality. That’s a side effect of 20+ years of programming…
So there i’d definitely like to see separation (underscore or case change). In base julia there are a few of those as well that trip me up unless I really focus on the name. For example “haskey” : my brain suggests “hashkey”.
I believe we at some point considered writing haskey(d, k) as k in keys(d). Maybe there was some performance issue with that, but perhaps we could still make that change. This is a good example of how refactoring into more fundamental vocabulary avoids smashing words together – instead, you compose the concept of more general pieces.
I think camelCase is unquestionably better from a consistency point of view (much more likely people will do the right/same thing), but this ship has sailed. If you write camelCase, you make your code super inconsistent with any other code in the julia ecosystem which really sucks (look at R to see the pain this can create … for example even something as fundamental as read.table has arguments with dotted names (the equivalent in R of _ names), and camelCase in the same argument list …). I really think swimming upstream on this in any language is really bad, the best solution is to just submit to the conventions and forget about ones preference. You just make beautiful code that much harder if anyone uses what you made.