Naming: Remove all underscores to matter what?

So in your dream world, people only speak English?

3 Likes

I meant: use the ASCII version. Eg sqrt, or xor.

Julia speaks julia. English is not my native language, actually, and working with German characters is always a pain. Nevertheless, all current programming languages are designed for US layout keyboards; e.g. writing latex on a German keyboard layout will damage your hands (backslash is too hard to reach). Already keywords in most languages are English, and I think stuff like Microsoft translating visual basic keywords/library function names is silly (I cannot copy-paste code from a US-Excel into a German Excel sheet; it does not run). Oh, or the silly silly German opcodes Siemens uses in the assembly for some of their PLCs (I only know this because of stuxnet, I do not work with these machines).

Since the computer language itself is English-based (all library functions), and modern computers in general are English-based (x86 assembly names, llvm assembly names, everything), I see no harm in forcing people to adopt English-compatible function/type/variable names. I am not proposing to restrict source-code comments or string literals. Some people like mathy symbols; fine. We are only talking here about the preferred way of handling julia source in contexts that are not julia-aware (editors without julia mode, git, grep, etc), and I am claiming that unicode is a huge pain in these contexts.

I am aware that this is a minority opinion, and I don’t want to start a huge flamewar. Sorry if I sounded too aggressive.

1 Like

I hope you saw this suggestion:

My objection with getting rid of underscores is when it is done just to get rid of them, which does hurt readability (such as a recent PR wanting to change an uppercase constant to remove the _).
Changes like module_name(mod)Symbol(mod) both gets rid of the underscore, and makes things more generic, which is a win-win.

I agree that Unicode should stay out of exported package and Base APIs because there are cases where it’s just not easy. For example, some clusters I’ve used have a terminal which doesn’t display unicode even if I’m using a Unicode-compatible terminal to view on my end.

Using Unicode in your own personal scripts is a perfect fine if not awesome thing to do though. Making your variables \alpha to match a paper makes things super legible compared to alpha*beta+gamma. So as always, it’s a tool and when used appropriately it’s great, but when abused it causes problems.

5 Likes

That’s something I try to avoid.

That’s not a suggestion, that’s quoting our own philosophy back at us. Of course we first try to do without compound names.

spones has been deprecated.

Which PR is that? Was it merged?

2 Likes

I don’t think changing things to have longer meaningful names with underscores and using aliases in packages for historical short forms, which is part of my suggestion, has been part of that philosophy, at least as far as I’ve seen.

:100:

I don’t recall - and I think you were the voice of sanity in that case :slight_smile:

I really can’t help but feeling that people are responding to this issue as if the Julia devs are asking everyone to avoid using underscores in their code. That’s not what’s happening, and as I’ve pointed out a number of times now, at least the vast majority of the stuff in Base is pretty clear.

Does anyone really think that getindex should be get_index or setindex! should be set_index! or ismissing should be is_missing or anything like that? Because I really didn’t think anyone was arguing that and if they are, I really can’t understand why in the world you’d want to make those names more verbose, they are simultaneously about as descriptive and succinct as they can possibly be in their current form.

I’m quite sure that nobody is advocating craziness like besselfunctionofthesecondkind, how much is there really in Base that is long enough that it seems unreadable? Examples?

6 Likes

I actually do find having things like that more readable with an underscore, i.e. with prefixes like:
is_, has_, get_, set_. Otherwise you end up with some that are a bit confusable if you don’t already know what they are, such as isqrt. There are also examples where mashing two words together can end up with something that looks bad, is_hit looks a lot worse as ishit!

I guess this is purely a matter of opinion so that I can’t offer any further justification but I emphatically disagree with this. Let me at least provide some context to show why I feel that way.

Most of my real thinking occurs when I write things out algebraically. Like most people, when I do this, I use succinct, abstract names all the time. Nobody in their right mind would ever want to look at an equation that contains \mbox{hyperbolic_sine}(y) rather than \sinh(y). So, if you wouldn’t want to look at this when actually reasoning about things in terms of their pure mathematical abstraction, why would you want to look at hyperbolic_sin(y) in your code? Sure, isbits is not as universally understood as sinh, but the same thing applies, why make everything more verbose? In cases where symbols are already clear and understandable anyway, it just takes extra brain power to parse more verbose names.

2 Likes

This depends upon which domain you are coming from - for mathematicians, sinh is well understood.
Also though, I wouldn’t advocate renaming functions that have a well understood name in the literature of the domain.
However, when it is a matter of a name that is short just because of legacy reasons (names had to be significant in the first 8 bytes in linkers back in the 80s, just like file names on Unix were 14 bytes, and 8.3 on CP/M and PC-DOS), such as the names that came from MATLAB, then I’d rather see a longer descriptive name (more like the ones in Mathematica), and a MATLAB.jl compatibility package for people who want to use the names they are familar with.

I agree with you. I find that imposing unicode characters in functions is a terrible idea and ignores the fact that thee are so many people that do not know what unicode characters are. Not to mention the difficulty to write them in many text editors.

the hyperbolic functions could be easily refactored with keyword arguments: hyperbolic = true, inverse = true, etc.

Does anyone really think that getindex should be get_index or setindex! should be set_index! or ismissing should be is_missing or anything like that?

Yes. I think this, especially when I also have get_something_that_is_a_bit_longer defined, and want SOME sort of (internal) consistency in my code. In LightGraphs, for example, we have has_vertex, which could easily (and more properly) be shortened to hasvertex, but then we have has_self_loops. Should we be inconsistent in our treatment of functions that start with has?

1 Like

First of all: hasselfloops may not be a good name, but it is a great one! Very memorable.

has_self_loops on the other hand, is quite awkward. Please, keep in mind that I don’t know what a “self loop” is, but could you call it isloopy, iscyclic, hascycles, or anything like that? has(x, Loop)? Etc. etc.

3 Likes

This graph has self-loops (specifically, vertex 5). Loops are different than cycles, unfortunately.

My intention is not to get into a renaming discussion with this particular function; rather, it’s to suggest that is_ and has_ have places in the language when the function names are longer or complex, and IMO it’s better to be consistent in your treatment of them than it is to try to shorten them and have inconsistencies.

This is why it’s a relevant discussion for this thread: since Base is moving towards is instead of is_, ideally, we’d maintain consistency with it. But we can’t.

Sure, it’s not mine either. I’m just trying to illustrate, using that as an example.

But, if you don’t want hasselfloops, I hope someone else can use it…

In my opinion, in this case yes (though I don’t want to get into arguments about individual packages. I think package developers should make there own choices, has_vertex is certainly not unreasonable, so you shouldn’t misconstrue this as a complaint about it).

I think the fundamental disagreement is between people who think verbosity is more important for readable code and those who think succinctness is more important for readable code. I don’t think there’s any objective right or wrong here, so I’m not sure what more can be said about it. As for me, I want things to look as much like their mathematical abstraction as possible, to be as succinct as possible, the verbosity just makes it harder for me to read things. I don’t care how inconsistent sinh is with other code that says hyperbolic_more_complex_function, I just don’t want to look at hyperbolic_sine. Ever. To me that that supersedes everything except extreme cases where function names are really totally unreadable, even if this results in some modest inconsistency.

(by the way, just to show I’m not being difficult for the sake of being difficult, I would argue for has_self_loops over hasselfloops)

I don’t think people are disagreeing about that at all. If something has a well known name in the literature (not just a common one in some computer languages), then it should keep that name, as people used to working in that domain, should be able to understand it.
Even Mathematica, known for longer descriptive names, calls hyperbolic sine sinh.

I don’t think, however, that there are many (any?) math functions that are of the form is*, has*, get*, or set*, for example, so whether or not they use _ to be more readable is a separate issue.

1 Like

If one writes code with complex expressions that contain many identifiers, then long, descriptive names make those expressions harder to read.

I think this is particularly a concern for people who write a lot of mathematical/physical oriented code. Short, concise variable names actually help legibility for us, since we look at the entire expression, not just each individual identifier. It’s not just about well-known names from the literature.

Edit: Furthermore, the code is often highly abstract, so there isn’t necessarily any reason to make the names descriptive. x, y, α, and so on is descriptive enough. The preference for brief names may come from this, too. Introducing abominations such as sin(x, hyperbolic=true, inverse=true) (as someone suggested) would basically destroy any legibility what so ever.

5 Likes