Naming: Remove all underscores to matter what?

Maybe \APLboxquestion ⍰

I see I missed \Elzglst above for ʔ, but AFAICT does not have a similar expansion ATM.

1 Like

I hate this idea; typing in unicode chars is always a terrible hassle and requires people to look up, configure and then remember how to get this char, for every single editor they use and every single weird character used in some package (and I guess most people use at least 3 editors - main editor, REPL, jupyter on at least 2 installations). I know that my hate on unicode names is not shared by most people here; but so far, most packages use ascii-only names for real functionality and only export unicode stuff as non-essential gimmics (I’m thinking of e.g. \nabla-stuff for AD).

hasselfloops is terrible; hasSelfLoops also sucks; has_self_loops is totally fine, regardless of conventions in base. If you find a name that is both good and also adheres to the now-changing conventions, fine; otherwise, break the convention.

I think consistency in julia is strongly overrated, and adherence with global conventions is non-negotiable (e.g. acos, abspath). Most people who program at all do so in multiple languages; having shared conventions helps (goal: allow people fluent in multiple languages to pick up some julia in a couple of days). I do not think that julia’s target audience should be “people who will only use julia in all of their lives”; julia is not mathematica. Sure it is a judgement call, but so what? Make a judgement!

1 Like

LaTeX expansions are semi-standard, you should be able to set up any decent editor with them. ?char in the Julia repl also tells you how to type them.

This is a strong statement that could use some supporting evidence (besides anecdotal). My perception is the opposite: scientists tend to have a “main language”, which they learn really well, with all its quirks and special features (and of course this language can change over time, but there is usually a single one, or at most two for most people). Programming in many languages is possible, but often leads to non-idiomatic code in all of them.

This is one of the sources of the so-called “two-language problem” that was a strong motivation for developing Julia; switching between multiple languages is like a cache miss for one’s brain.

That depends on one’s definition of “pick up”. To use Julia idiomatically, I think that even experienced programmers would need about 1000 hours (approximately 4-6 months full time work), perhaps somewhat shorter with prior exposure to multiple dispatch and macros (but very few languages have these). In any case, I think that the significance of surface syntax in the learning process is overrated: it is nicer of course when consistent, but given a good help system it is not one of the greatest hurdles in learning Julia.

When reading latex source, I can see how to get the char from each occurence in the code. I can also easily grep for it. Unicode-heavy julia code looks like the output of a latex-expansion preprocessor to me; I want to work with un-expanded source. \mu is more readable than a unicode mu to me (in monospaced text mode; of course, nothing beats nice compiled pdf). I am aware that this is a minority opinion here, but so far base and most packages have been pretty accomodating to this; since sromberger asked, these are my two cents. For what it’s worth, I would be quite happy with a language-level decision that source-code symbols are always equivalent to some form of latex-expansion (e.g. x\:mu:y is always equivalent to xμy, I am not proposing a specific syntax here, just pick something that doesn’t parse today, and apply the transformation during parsing/tokenization) – that way, I could de-unicodify source text (outside of string literals, of course) before working with it, and also julia would specify the set of expansions that all editors are supposed to adhere to. The way I see it, you need to compare edge\APLboxquestion vs has_edge, not edge⍰ vs has_edge, with the bonus difficulty that edge⍰ does not tell you how to type it in, and is subject to confusion between unicode chars that look almost the same.

Extremely dirty statistic: stackoverflow survey https://insights.stackoverflow.com/survey/2017#technology-languages-over-time. Adding up the percentages of respondents who used one of the top 10 popular languages yields 277. In other words: The average (not median) stack overflow respondent uses 2.77 languages out of {C#, C, PHP, C++, objective-C, java, ruby, javascript, python, node.js}.

This does not count “small” languages like julia, go, rust, lisp, fortran, lua, matlab/octave, mathematica, haskell, assembly, tex (writing packages, not typesetting documents), etc; I would guess that the numbers look even larger for julia, because people who are willing to learn a niche language… are willing to learn a niche language. (and yes, counting javascript and node.js separately is rather questionable)

Sure. I picked up on julia quite recently and my code still kinda sucks. But internal consistency in naming is irrelevant for people who know the language inside-out.

Imho, the two-language problem is not working in different languages; it rather is working in different languages in the same project, on the same data, especially if one of the languages sucks at working on the data representation preferred by the other language. For simple things, Julia and C often work nicely together because structs and most types are binary compatible; C and python are a pain because you need to learn CPython internals.

I agree. Just saying that the name that programmers from other languages would expect is often better than some attempt to be internally consistent at all costs.

2 Likes

There’s a discussion about using ? instead of the is / has prefix convention here: Question mark in variable names

I’m not a fan, and I am even more critical about using a unicode-questionmark thing. I think unicode is great in variable names (makes source code much nicer and more readable), but not in function or type names, and certainly not in exported names.

Isn’t that pretty much established practice? Allow unicode, but don’t force it on users?

4 Likes

De facto the approach of Julia seems to be that

  1. all functions in Base have an ASCII variant (is this correct?),
  2. some have a Unicode alias (eg and in),
  3. packages pretty much do what they like, within the limits of the parser.

I think that it is very reasonable. I am constantly on the fence about Unicode in code, first I thought it was a gimmick, then found it very useful, then saw that it is easy to get carried away. I think that we need time to explore the trade-offs offered by this feature, and packages are the best place for that.

4 Likes

half cent comment: findshorttestpath is obviously a case where it’s more confusing than find_short_test_path

Have you seen any packages that export functions or types with unicode names? Just curious.

Yes.

The point I have been making isn’t that removing underscores improves readability – it certainly does not. But doing so might encourage you to look for a different name.

I don’t like the use of Unicode characters either. Sometimes it is difficult to tell which character is part of the name and which character is an operator!

2 Likes

With the lamentable exception of infix-xor. Argh, the pain!

Some are Julia specific, like \xor or \euler.

I think that’s the difference - most programmers I know tend to know 3 or 4 languages, and will pick the one that’s most appropriate for the task at hand, JavaScript (or TypeScript etc.) for web scripting, C, C++, Python, bash.

Depends on the programmer. The one thing I really do differently is that I always try to parenthesize defensively, and not depend on the particular precedence levels of the operators in a language, but for the rest, I do write idiomatic code in each.

The hardest problems come not when something is completely different, but rather when something is what is called a “false cognate” in natural languages, when a word looks the same, but has a different meaning.

2 Likes

If you used GitHub - JuliaString/StringLiterals.jl: Implement improved string literals with Swift-style syntax for interpolation, hex, & unicode characters, plus C & Python style formatting and Unicode, HTML, LaTeX, and Emoji entities, you could use LaTeX (as well as HTML, Unicode, and Emoji entity names in your string literals also), f"This is a \<dagger>" for example.

Totally agree. I strive for pure ASCII in source code myself.

:+1:

The irony here is that it’s find_shortest_path - shortest, not “short test” :slight_smile:

In any case, thanks for the feedback. We’ve decided to remain with underscores: as much as I dislike them, there’s really no practical alternative.

4 Likes

Very few I know of among the registered ones. Eg IntervalSets.jl exports ±, Tau.jl exports names using τ.

Whether the ASCII and the unicode versions have the same infix/prefix syntax is an orthogonal issue. Eg vs sqrt. The important thing is that you can write everything in ASCII if you need to.

Thanks, that’s nice. But my point about name-transformations in source code is the following:

During parsing of the source, julia already normalizes names, like different unicode spellings of the greek letter mu. That means that runtime Symbol(String) is not the same as the parser uses; you can copy-paste a name from you source into a string literal, convert it to Symbol at runtime and end up with a different symbol.

Now there are several semantically equivalent transformations possible on source-code (post parsing) like white-spaces, comment removal/introduction, changing julia-equivalent unicode spellings, etc (all these can be done on source pieces separately!). These cannot be performed on unparsed source-text, because the unicode spelling is semantically meaningful in string literals. There is AFAIK no possible transformation on string literals, due to things like r"foo" and f"foo" (or course you can change the encoding of the source file, but that is not interesting).

What I would wish for is a parsing-time semantically-equivalent normalization that replaces all non-ascii names outside of string literals by a normalized form, defined by the language and optimized for readability (e.g. turn all xμy into x\:mu:y or x\:u03bc:y (if no human-readable latex-like name is defined)).

In my dream world, julia would forbid all non-ascii tokens (except inside string literals), define the big fat transformation table that maps between \:mu: and μ, and it would be the job of the IDE to display this as a μ. For IDE-users everything stays the same. Working with non-ascii julia names is already near impossible without an IDE. I think that, when using a non-IDE editor (i.e. an editor without julia-mode), almost everyone would prefer ascii over unicode. Note that this only works as long as we can parse the code well enough to distinguish code, comments and string literals.

How? I can generate the symbol, using the string-to-symbol conversion and \u syntax in my string literals. Ah, but getfield(sometype, somesym) is afaik not resolved at inference time, if somesym is a global constant Symbol. Also, how do I look up a function from a symbol (interned string), and a type from a symbol, such that this all resolves at inference time?

That is, can I use a unicode-exporting package using ascii only (generating identical @code_native for my exports; computations done during loading / compilation are allowed to differ)?

Preferably without calling the parser? (of course eval should work: transform the unicode-using source file into e.g. base64, include it as a string literal and eval it)

I see that some people feel the removal of underscores has “gone too far”, but I don’t see many concrete suggestions. What are the cases where we removed underscores, but shouldn’t have? Is method_exists better than hasmethod? If somebody has a list of the “worst offenders” that should have underscores I think we’d entertain it.

2 Likes

So in your dream world, people only speak English?

3 Likes