Unicode \epsilon\_y

Hi,
I am trying to define a variable \epsilon\_y. And it does not autocomplete. \epsilon\_x works but \epsilon\_y and \epsilon\_z do not work. I checked the Unicode Input page in Julia docs and \_y is not defined.

Could someone please help me figure out how to create this variable if it is possible.

ϵₓ
ϵ\_y

Alas, the unicode consortium have not seen fit to add various sub and super scripts to unicode at all. I expect subscript y is one such which was left out and that this is therefore impossible :frowning:

See for example, https://stackoverflow.com/questions/6638471/why-does-the-unicode-superscripts-and-subscripts-block-not-contain-simple-sequen

3 Likes

I was just thinking how ridiculous this situation is, and started looking at what it would take to write a proper proposal. But it seems @stevengj has already done so :tada:

4 Likes

Currently, as of Unicode 12.1, there are only 40 possible subscript characters. You can see the always current list here:

All Subscript Characters

And, it appears that Julia supports most of them (I think maybe only the ARABIC SUBSCRIPT ALEF:

"   ٖ "

is missing).

You might could get away with using \epsilon\_gamma as it looks similar to “y”.

As far as why more subscript characters don’t exist, I would say it is the design goal of Unicode. According to the official Unicode Standard:

the Unicode Standard does not encode idiosyncratic, personal, novel,
or private-use characters, nor does it encode logos or graphics.

and:

The Unicode Standard does not attempt to encode features such as language, font, size,
positioning, glyphs, and so forth.

Thanks for the suggestion. I settled for ϵx and ϵy the next best versions of what I wanted. One of the nice perks for using julia is that we can use actual math symbols as variable names. I was hoping we could write math like latex, but i guess we do not have the full power of latex in Unicode. I also have a hard time to write longer subscripts like βₑₓₚ which look weird and hard to write \beta\_e\_x\_p.

Hope there is a solution or external package for julia which allows us to use variable names like in latex.

My understanding is that these letters are not “subscripts” and “superscripts”, but symbols in their own right, used in chemistry, linguistics, etc.

Sub- and superscripts are meaningful in the context of typesetting.

Technically, using eg as a math subscript of x is a misrepresentation hijacked for a particular purpose (yes, I am doing it too), not unlike ʇxǝʇ uʍop ǝpısdn.

From a practical perspective, I found people with less than perfect eyesight find these difficult to read, so I try not to use them when I want my code to be widely accessible.

2 Likes

Yes, the existing subscripts and superscripts have been added piecemeal. Some for compatibility with historical encodings and some for their semantic in a particular field as you say.

That’s all well and good. What I find pretty crazy is that unicode has an entire large block of bold and italic letters for use as math symbols, but that subscripts and superscripts weren’t similarly considered.

4 Likes

That is by design. The bold, italics, bold italics, franktur, sans-serif, monospace, script, double-struck, etc “Mathematical Alphanumeric Symbols” have specific meanings, even if on the surface it looks like merely a difference in presentation.

Page 810 of the Unicode Standard (version 12.1)

The letterlike symbols include some of the few instances in which the Unicode Standard encodes stylistic variants of letters as distinct characters. For example, there are instances of blackletter (Fraktur), double-struck, italic, and script styles for certain Latin letters used as mathematical symbols. The choice of these stylistic variants for encoding reflects their common use as distinct symbols. They form part of the larger set of mathematical alphanumeric symbols. For the complete set and more information on its use, see “Mathematical Alphanumeric Symbols” in this section. These symbols should not be used in ordinary, nonscientific texts.

And, the MathML specification, section 7.5: Mathematical Alphanumeric Symbols clarifies even further:

A MathML processor must treat a mathematical alphanumeric character (when it appears) as identical to the corresponding combination of the unstyled character and mathvariant attribute value. It is important to note that the mathvariant attribute specifies a semantic class of characters, each of which has a specific appearance that should be protected from document-wide style changes, so the intended meaning of the character may be preserved. The use of a mathematical alphanumeric character is also intended to preserve this specific appearance, and so these characters are also not to be affected by surrounding style changes.

Meaning, these stylistic variations exist for solely for their functional differences. This is also the reasoning behind the existence of any subscript characters, outside of backwards compatibility with characters from existing code pages. The non-compatibility subscript characters that exist all have specific, inherent meaning. On the other hand, reproducing the US English alphabet for random usage (e.g. constructing words, footnotes, etc) would only be for stylistic difference since there would be no inherent meaning to each specific letter. This is why the preference for these purposes is to handle it via markup / font.

That being said, as someone who typically does not have the ability to specify markup for the situations that I deal with, I completely understand the frustration in not having complete sets of these stylistic variations. It’s quite the tease, especially with most of the ones that do exist not equating to the equivalent base English letters. At the same time, I get why Unicode doesn’t go down this road: they would have to offer similar variations for other scripts / languages, and that would get extremely messy.

2 Likes

Don’t understand why such complicated reasoning is needed to decide whether or not to complete the subscript and superscript alphabet.

1 Like

The way I read these passages (and several others that are related but too much detail for me to have posted) makes it seem quite simple: adding more subscript characters merely so that we can use them free-form is expressly contrary to their stated mission / philosophy.

Maybe I missed it, but, what are those subscript characters actually for, then?

If you click on the “All Subscript Characters” link in my post towards the top, it will take you to the official Unicode list. At least half of them are for various phonetic alphabets (IPA, UPA) in which case each one has a very specific meaning / function. Then, in the “Superscripts And Subscripts” block (20 of them), the Unicode Standard (the actual publication), states (Version 12.1, page 829):

In general, the Unicode Standard does not attempt to describe the positioning of a character above or below the baseline in typographical layout. Therefore, the preferred means to encode superscripted letters or digits, such as “1st” or “DC0016”, is by style or markup in rich text.

and:

A certain number of additional superscript and subscript characters are needed for roundtrip conversions to other standards and legacy code pages. Most such characters are encoded in this block and are considered compatibility characters.

and:

Standards. Many of the characters in the Superscripts and Subscripts block are from character sets registered in the ISO International Register of Coded Character Sets to be Used With Escape Sequences, under the registration standard ISO/IEC 2375, for use with ISO/IEC 2022. Two MARC 21 character sets used by libraries include the digits, plus signs, minus signs, and parentheses.

1 Like

I do understand the distinction between inherent meaning (semantics) and surface level presentation.

The point is that, in mathematics, there is a functional difference between normal letters and superscripts/subscripts. This is directly analogous to the difference between

Both of these exist separately in unicode because style is semantically significant in mathematical writing, and the same can be said for mathematical subscripts and superscripts.

3 Likes

I think the important point here is that these characters were not intended to be used as sub/superscripts for math & related fields. They are not a replacement for typesetting markup.

Also, arguably, in a_i and a(i) the i is not different per se, just that it is in a different place.

I don’t know if there is a consistent place to stop if Unicode starts doing this. Why not a_\breve{o}?

1 Like

But now I don’t get why there are any subscripts at all. Why is ‘subscript-x’ semantically different from ‘x’, but ‘subscript-y’ is not different from ‘y’? Does the unicode standard explain in each case why one is in but not the other?

Yes. The existing subscripts have specific semantic meaning and established use. Eg “ₐ, ₑ, and ₒ are used to indicate the vowel coloring of a laryngeal H”. My understanding is that the intent was allowing linguists to use these without any markup (which they don’t otherwise use as extensively as STEM fields).

1 Like

I think you mistake my point. I understand why the existing subscripts and superscripts are incomplete (frustrating as this may be), and I’m not arguing that they set a precedent for adding the rest.

What I intend to argue is that the Unicode block “Mathematical Alphanumeric Symbols” is an exact analogy and precedent for adding multiple styles for use in a field (mathematics) where style lends meaning. Further, that it would be not only helpful, but entirely consistent for unicode to add a new block of superscripts and subscripts for all the latin and greek characters because these are commonly used in mathematical notation.

3 Likes

I don’t feel very strongly about this either way — I find both including and not including these characters a position that can be argued for by reasonable people. I was just trying to summarize what I understood as the position of the Unicode Consortium (but of course, I don’t speak for them, and my reading of the situation could be wrong).

But as I said above, practically I find these characters problematic because commonly used fonts don’t always make them characters very readable: most programmers are trying to fit enough lines on the screen so that lowercase chars remain easy to read without causing eye fatigue, and these symbols are usually about half ex tall, so overusing them can cause eyestrain. I was super-enthusiastic about them initially but now I try use them less because some collaborators complained. Compared to a0, a₀ does not save any space. This is just my (current) personal preference, YMMV.

Yes, I agree there’s some ergonomic problems with these. The difficulty of typing such names interactively, and the fact that they are not considered identifiers by vim makes them somewhat annoying to use. I suspect the latter problem is a symptom of the Unicode category being inconsistent with the way Julia uses them in identifiers.

FWIW, over the course of the last 3 years, I was

  1. initially very skeptical of the utility of Unicode in code (“it is just a gimmick”),
  2. then realized that it can be extremely useful,
  3. then got carried away with stuff like ∂Γ₀ (with a bar but that does not render correctly here)
  4. then realized that maintaining code with an excess of Unicode is not that fun.

I am now sympathetic to (implicit, or somewhat rarely, explicit) Unicode-preferences of various packages and try to follow them when I contribute, but I try to use it only when it makes my code significantly more readable.

5 Likes