Unicode: a bad idea, in general

Ronis_BR · June 5, 2023, 8:04pm

Thanks! I did not know about that specific symbol I will certainly change at some point the code.

mikmoore · June 5, 2023, 8:15pm

DNF:

tanh(square_root_of_wavenumber_square_minus_angular_frequency_squared_over_shear_velocity_squared * thickness/2) / tanh(square_root_of_wavenumber_squared_minus_angular_frequency_squared_over_longitudinal_velocity_squared * thickness/2) - 
4*square_root_of_wavenumber_squared_minus_angular_frequency_squared_over_longitudinal_velocity_squared*square_root_of_wavenumber_square_minus_angular_frequency_squared_over_shear_velocity_squared * wave_number^2 / (wave_number^2 + square_root_of_wavenumber_square_minus_angular_frequency_squared_over_shear_velocity_squared^2)^2

These names are so uselessly literal that they’re hardly better than alpha and beta at best. I would argue they’re worse, since they take such a long time for the reader to parse and I still don’t know where the parentheses go. Do they really not have any interpretation you can put a better name to?

In any case, I think this is a straw man. It has nothing to do with unicode – you wrote out the same thing with alpha and beta (rather than their unicode forms) and it was just as readable.

I would write this as

# forgive me if I mis-guessed where the parentheses belong
alpha = sqrt(wavenumber^2 - angularfrequency^2) / longitudinalvelocity^2 
beta = sqrt(wavenumber^2 - angularfrequency^2) / shearvelocity^2

tanh(beta*thickness/2) / tanh(alpha*thickness/2) - 4alpha*beta*wavenumber^2 / (wavenumber^2 + beta^2)^2

If alpha and beta are only used on these lines, then those names might serve. But (especially if they live longer to be used elsewhere) I would still try to give them better names. Whether they are meaningless or meaningful subexpressions, what they represent is fairly clear from their nearby definitions. Much more-so than the overly literal names (which were also longer, I think).

PetrKryslUCSD · June 5, 2023, 8:18pm

I think the names were over-the-top provocations to make readers smile. But your point is well taken: it is possible to avoid using Unicode while maintaining legibility.

Ininterrompue · June 5, 2023, 8:49pm

My two cents: I would be turned off seeing this code. The dels are fine, but the star (why not just “star”), subscript n and m (why not just the normal n and m), and whatever is above the del in the second function (an arrow?) are things which I wouldn’t know how to type. Yes there is copy and pasting into the REPL, and I’m sure it’s adhering very close to the relevant formulae in the literature, but from a beginner’s perspective, knowing the REPL help exists in the first place, having to copy and paste over and over until I remember, or needing a cheatsheet…these are all barriers to entry. I’m sure these are not the only Unicode symbols in the package.

The only things I would use Unicode for are Greek letters, and only because it makes equations more concise. Even so the Greek letters should be documented at the very minimum. They suffer from the same drawbacks as other one letter variable names do (except dummy variables).

As far as keywords go, I’m reminded of an example of a function to train an ML model where \lambda is used instead of learning_rate. The latter is more explicit and more readable, no issues. For \lambda, it could be referring to anything, really. Finally, and I’ve been seeing this in a lot of places: using \in instead of just in while writing a for loop is also an unnecessary barrier to readability - not everyone is familiar with mathematical notation. You could get away with it in a package specifically about pure mathematics, but everywhere else there’s going to be people wondering what it means. “Is it different from in or = ?”

DNF · June 5, 2023, 8:51pm

That was exactly the point. To demonstrate that there aren’t always nice, brief self-explanatory names for useful variables. And how bad it can get when you insist on descriptive names.

None that I know. And if there were, I could find some other example in their place.

No, it was not. It was strictly less readable, with no more explanatory power. And I could easily find a more convoluted expression, where names like alpha, beta or theta show up multiple times in complex combinations to make this example even more extreme.

My point, on the other hand, was that it often isn’t possible, nor desirable.

The_Mastermage · June 5, 2023, 9:35pm

I want to add here that we are currently still discussing physics where we often actually have reasonable explanations or descriptive variable names. Now if we go over to more pure math we will often not have very descriptive names any ways.

DNF · June 5, 2023, 10:18pm

No, the above is physics. The expressions is from one of the characteristic equtions of Lamb waves. In my experience, physics have at least as ‘ugly’ expressions as pure maths, since it often has to deal with messy ‘real life’.

Anyway, I thought we were discussing programming, not physics.

mikmoore · June 5, 2023, 11:17pm

Sorry, I lost the thread of even my own argument as I was writing my above reply. The point I should have been making is this:

I don’t think it’s controversial that some variable names are better than others. Which names are better is an aesthetic choice that people may honestly disagree upon.

a is certainly not intrinsically a worse variable name than \alpha (and, as a bonus, I can type a into this box without copy-pasting from elsewhere). One also has A, alpha, some_function_of_previous_values_that_i_cannot_give_a_better_name, and many other ASCII alternatives. In my personal opinion, any of these (including a and \alpha) are mediocre variable names that should only be used if no descriptive name exists or as a subexpression to something that closely follows.

The original post discussed the hazard of unicode symbols that may at-a-glance be confused with ASCII or other unicode symbols. I also raised the type-ability in this post by refusing to copy-paste symbols from my REPL.

What I was failing to get at in my meandering earlier post is a different hazard of unicode: a is not a good variable name when a meaningful alternative exists. \alpha is not better, but sometimes people forget this because it’s not a single ASCII letter.

I’ve collaborated with people that use unicode to implement math that only ever existed on a whiteboard or scrap of paper. When I get that code, it’s a pain for me to puzzle through it without that reference at-hand. This has nothing to do with their choice of \delta or delta, but I wonder whether they might have managed to come up with a more descriptive alternative if \delta wasn’t an option they reached for so readily.

kevbonham · June 6, 2023, 2:23am

I used to have a keyboard shortcut that swapped my keyboard input to Greek so that I could just hit the ‘a’ key for \alpha. It’s even easier on my moonlander to map all kinds of Unicode symbols.

ChrisRackauckas · June 6, 2023, 4:37am

I spy Diffractor.

DNF · June 6, 2023, 7:17am

Not intrinsically worse, but in context it may well be worse.

I don’t really give that any weight at all when choosing variable names.

Some times there are no useful descriptive variable names (such as in my example further upthread). Sometimes there are, but they are still quite long, and make mathematical expressions hard to read.

For example: incidence_angle isn’t a bad name. On its own, probably better than θᵢ or θi. But inside complicated expressions, where it repeats multiple times as part of sin(3-θi^2/2)/cos(2θi/3), etc. and together with several other similar symbols, it can severely harm readability. Anyone reading the code for understanding will know the context of θi, and will find it much easier to read the expression as a whole when the symbols are short and distinctive.

The often, IMO, monomaniacal obsession with making each individual symbol descriptive and understandable, can end up sacrificing the readability and clarity of the overall expression. Using a instead of α, or t instead of θ is possible, but if the Greek letters have strong conventions and associations attached to them, I don’t think it’s a good trade-off.

The principle is really very easy: Use unicode symbols when they improve the readability of your code. Using ρ next to p makes readability worse, so why use it? If someone uses variable names l, I and O, that just means they chose bad names, not that ASCII is a bad idea, in general.

The_Mastermage · June 6, 2023, 7:51am

What i meant with my statement that more often in math you have some equation, for example solutions to random differential equation that uses just some random alpha or beta there which has no more descriptive name than is often existing in physics where it usualy does have some physical intuition behind and therefore a meaning. Not that it doesn’t happen in physics but much rarer.

Isn’t physics just the programming of the Universe.

DNF · June 6, 2023, 8:02am

I agree, in pure mathematics you more often have completely abstract entities with no obvious ulterior ‘meaning’. But in physics you often get complicated compound expressions that also have no obvious intuitive meaning or easy translation to a concise name.

The_Mastermage · June 6, 2023, 8:13am

Fair enough, good point to raise. In my part of physics we luckily have mostly reasonable descriptors for the variables. But if you for example touch QFT or GR you will likely have very convoluted terms.

xgdgsc · June 6, 2023, 8:34am

Just use a more visually distinguishable unicode character set, like the Chinese Characters. With extensions like 中文代码快速补全 - Visual Studio Marketplace . The input efficiency can be very high and visually crystal clear.

moble · June 6, 2023, 5:39pm

I hesitate to add my two cents, for all the derision I’m sure to attract for going overboard, but personally I come down very strongly on the side of using UTF-8 in my own code, while ensuring that the API allows people to use either UTF-8 or ASCII.

My community’s literature has very strong and fairly consistent conventions for a lot of variables, so it really helps to make the code look familiar. On the other hand, my highest-level API will often be used from python, where even something like M₁ is not allowed. So I make sure that my API accepts keywords in both forms with code like this:

function orbital_evolution(
    M₁, M₂, χ⃗₁, χ⃗₂, Ωᵢ;
    Lambda_1=0, Lambda_2=0, Omega_1=Ωᵢ,
    Λ₁=Lambda_1, Λ₂=Lambda_2, Ω₁=Omega_1
)
    # Code that only uses the Λ₁, Λ₂, Ω₁ forms

Yes, it’s possible for users to pass both forms of the keywords and one will be thrown away, but the documentation only suggests that it is possible to use one or the other, so I don’t think it will come up. And now, I can make beautiful calls like

orbital_evolution(M₁, M₂, χ⃗₁, χ⃗₂, Ωᵢ; Λ₁, Λ₂)

while the spoilsports () would probably write that as

orbital_evolution(M_1, M_2, chi_1, chi_2, Omega_i, Lambda_1=Lambda_1, Lambda_2=Lambda_2)

Similarly, some function names can be made pretty, while allowing ugly calls:

μ(M₁, M₂) = (M₁ * M₂) / (M₁ + M₂)
const reduced_mass = μ

I may be opinionated, but my APIs can be flexible.

Ronis_BR · June 6, 2023, 6:06pm

Wow! This approach is a very nice one I will see if I can incorporate this suggestion in my packages. Thanks!

Eben60 · June 6, 2023, 6:09pm

Unicode: a great idea, if used prudently!

Just finished implementing some small package

The calculations therein are based mostly on one paper. It was immensely helpful for me to be able to use the same notation as in the paper, with all these α, β, ϕ, Rₛ, Rₚ etc. I was even able to copy some formulas from the paper’s pdf into the code and just edit it slightly.
In physics, μ₀ is unambiguous, at least in the context of electromagnetism, and formulas using it are better readable IMO.
One of the problems which cost me half ~~a day~~ a night of debugging was that elliptic integrals in SpecialFunctions are defined using m as argument, not k, with k²=m. I haven’t RTFM, only checked quickly the function help, and didn’t pay attention to having m here and k in Wikipedia. Had the function help used k² , which is just as valid identifier, instead of m, it could save me a lot of time.

Nathan_Boyer · June 6, 2023, 6:39pm

I think the trick to using Unicode clearly is just docstrings and comments. Then the math can be tight, and the meaning can be clear simultaneously. Just add equation references and definitions until everything is clear, then proceed with the mathy bits.

"""
    vonmises(σ₁, σ₂, σ₃) -> σₑ
    vonmises(σₚ...) -> σₑ

Computes equivalent von Mises stress from principal stresses. Argument order does not matter.
If your principle stresses are contained in a vector or tuple, just splat it into the function input with `...`.

# Reference
ASME BPVC.VIII.2-2019 Equation 5.1
https://www.asme.org/codes-standards/find-codes-standards/bpvc-viii-2-bpvc-section-viii-rules-construction-pressure-vessels-division-2-alternative-rules/2019/print-book
"""
vonmises(σ₁, σ₂, σ₃) = sqrt(((σ₁ - σ₂)^2 + (σ₂ - σ₃)^2 + (σ₁ - σ₃)^2)/2)

σ₁ = 100 # largest principle stress
σ₂ = 50  # middle principal stress
σ₃ = -80 # smallest principal stress
σₑ = vonmises(σ₁, σ₂, σ₃) # equivalent stress (von Mises stress)

... more elaborate computations with the defined symbols ...

Ahmed_Salih · June 6, 2023, 7:20pm

From what I have observed, code is read much more than it is developed. Being able to read and gain understanding fast is a huge benefit, compared to the initial annoyance of typing a bit more.

Kind regards

Topic		Replies	Views
Warning against Unicode confusables Internals & Design unicode	51	1930	January 13, 2024
Looking beyond Unicode Internals & Design	8	1773	November 26, 2016
Running out of letters: Pitfalls of Unicode? New to Julia unicode	11	1306	May 14, 2021
Invalid unicode variable General Usage	3	1019	March 3, 2018
General subscripts in symbols? General Usage unicode	4	9269	December 26, 2021

Unicode: a bad idea, in general

Related topics