Unicode: a bad idea, in general

I agree, in pure mathematics you more often have completely abstract entities with no obvious ulterior ‘meaning’. But in physics you often get complicated compound expressions that also have no obvious intuitive meaning or easy translation to a concise name.

Fair enough, good point to raise. In my part of physics we luckily have mostly reasonable descriptors for the variables. But if you for example touch QFT or GR you will likely have very convoluted terms.

image
Just use a more visually distinguishable unicode character set, like the Chinese Characters. With extensions like 中文代码快速补全 - Visual Studio Marketplace . The input efficiency can be very high and visually crystal clear.

20 Likes

I hesitate to add my two cents, for all the derision I’m sure to attract for going overboard, but personally I come down very strongly on the side of using UTF-8 in my own code, while ensuring that the API allows people to use either UTF-8 or ASCII.

My community’s literature has very strong and fairly consistent conventions for a lot of variables, so it really helps to make the code look familiar. On the other hand, my highest-level API will often be used from python, where even something like M₁ is not allowed. So I make sure that my API accepts keywords in both forms with code like this:

function orbital_evolution(
    M₁, M₂, χ⃗₁, χ⃗₂, Ωᵢ;
    Lambda_1=0, Lambda_2=0, Omega_1=Ωᵢ,
    Λ₁=Lambda_1, Λ₂=Lambda_2, Ω₁=Omega_1
)
    # Code that only uses the Λ₁, Λ₂, Ω₁ forms

Yes, it’s possible for users to pass both forms of the keywords and one will be thrown away, but the documentation only suggests that it is possible to use one or the other, so I don’t think it will come up. And now, I can make beautiful calls like

orbital_evolution(M₁, M₂, χ⃗₁, χ⃗₂, Ωᵢ; Λ₁, Λ₂)

while the spoilsports (:slight_smile:) would probably write that as

orbital_evolution(M_1, M_2, chi_1, chi_2, Omega_i, Lambda_1=Lambda_1, Lambda_2=Lambda_2)

Similarly, some function names can be made pretty, while allowing ugly calls:

μ(M₁, M₂) = (M₁ * M₂) / (M₁ + M₂)
const reduced_mass = μ

I may be opinionated, but my APIs can be flexible.

29 Likes

Wow! This approach is a very nice one :slight_smile: I will see if I can incorporate this suggestion in my packages. Thanks!

1 Like

Unicode: a great idea, if used prudently!

Just finished implementing some small package

  1. The calculations therein are based mostly on one paper. It was immensely helpful for me to be able to use the same notation as in the paper, with all these α, β, ϕ, Rₛ, Rₚ etc. I was even able to copy some formulas from the paper’s pdf into the code and just edit it slightly.
  2. In physics, μ₀ is unambiguous, at least in the context of electromagnetism, and formulas using it are better readable IMO.
  3. One of the problems which cost me half a day a night of debugging was that elliptic integrals in SpecialFunctions are defined using m as argument, not k, with k²=m. I haven’t RTFM, only checked quickly the function help, and didn’t pay attention to having m here and k in Wikipedia. Had the function help used , which is just as valid identifier, instead of m, it could save me a lot of time.
4 Likes

I think the trick to using Unicode clearly is just docstrings and comments. Then the math can be tight, and the meaning can be clear simultaneously. Just add equation references and definitions until everything is clear, then proceed with the mathy bits.

"""
    vonmises(σ₁, σ₂, σ₃) -> σₑ
    vonmises(σₚ...) -> σₑ

Computes equivalent von Mises stress from principal stresses. Argument order does not matter.
If your principle stresses are contained in a vector or tuple, just splat it into the function input with `...`.

# Reference
ASME BPVC.VIII.2-2019 Equation 5.1
https://www.asme.org/codes-standards/find-codes-standards/bpvc-viii-2-bpvc-section-viii-rules-construction-pressure-vessels-division-2-alternative-rules/2019/print-book
"""
vonmises(σ₁, σ₂, σ₃) = sqrt(((σ₁ - σ₂)^2 + (σ₂ - σ₃)^2 + (σ₁ - σ₃)^2)/2)
σ₁ = 100 # largest principle stress
σ₂ = 50  # middle principal stress
σ₃ = -80 # smallest principal stress
σₑ = vonmises(σ₁, σ₂, σ₃) # equivalent stress (von Mises stress)

... more elaborate computations with the defined symbols ...
1 Like

From what I have observed, code is read much more than it is developed. Being able to read and gain understanding fast is a huge benefit, compared to the initial annoyance of typing a bit more.

Kind regards

1 Like

Would love to see the Python version! :joy:

1 Like

I would love Unicode package names:

using 🍌, 🍳, 🍴

And of course we let slip through our fingers the emoticon for the file name extensions.

7 Likes

Julia could also be the first to adopt an emoji for the default file extension:
MyCoolApp.:cloud_with_lightning:

Edit: heh, should have read the post above mine

1 Like

Too late: Modular Docs - Mojo🔥 FAQ

8 Likes

Hopefully no derision at all, because I 100% agree with using the capabilities of unicode to the fullest extent possible. In fact, it’s one of the things that attracted me to Julia in the first place. I appreciate that @PetrKryslUCSD will never agree with that, but luckily we don’t work on the same code bases :wink:

Your idea of allowing both ascii and unicode keyword arguments is amazing, and I’m very likely to adopt that approach. Thanks!

3 Likes

For those on a Mac wanting to type scientific unicode a lot: The "U.S. International - Scientific" Keyboard Layout - Michael Goerz

2 Likes

A careful (conservative) use of unicode can make the code much easier to understand, while still being clear. My rules are: simple unicode that (1) most programmer’s fonts (Consolas, say) can render; (2) not easily mixed up with other similarly looking symbols.

2 Likes

You shouldn’t hesitate. I think your opinion is the majority opinion here. Unicode is good, but there should be non-unicode API options available, and your method of defining both ASCII and unicode kwargs simultaneously is clever and worth sharing.

4 Likes

Well, avoiding Unicode in API (or at least giving an option not to use it) is good. But (as someone pointed out already), code is read more often than it is written. So we should try to make the graphical presentation of the internal code (i.e. not API) as easy on the reader as possible. The ability of readers to affect the graphical presentation of Unicode is typically limited (viz Github), and hence use of problematic symbols should be prohibited, and use of other symbols should be considered from the point of view of visual confusion (viz \rho vs p vs rho etc.).

1 Like

I assume that’s not every non-ascii symbol? Do you know which ones are particularly problematic?

Agreed! So it should be made readable by using whatever notation and conventions are approriate for the problem at hand.

If I am writing code which implements some physics concept, and I expect my code to be read by fellow physicists who have similar training to me, then the most appropriate notation and conventions to clearly communicate ideas are typically going to involve a lot of unicode and single letter variables.

If I’m writing code that I intended to be read and used by a wide variety of people with a wide variety of backgrounds, then the choices as to what maximizes comprehension may look very different (or maybe they just involve better documentation).

2 Likes

https://util.unicode.org/UnicodeJsps/confusables.jsp

https://websec.github.io/unicode-security-guide/visual-spoofing/
http://www.unicode.org/Public/security/latest/confusables.txt
How is this for a start?