Unless we all use the same fonts, we might not know whether we’re seeing the same thing…
Yes. (I always say yes… :)) I can make fonts easily enough (Glyphs Mini is very good) but I’m not going to draw 10,000 Unicode characters very quickly.
Use Julia to automate it?
In this case I was imagining taking an existing open source font (I rather like the croscore fonts so I had cousine in mind) and deriving a set of subscript and superscript glyphs by automated scaling & translation. Followed by manual tweaking of individual glyphs as required. But I don’t know a lot about fonts; does that sound realistic?
Hey Chris. Long time no see!
It turns out that the maths symbol block is no more complete than the super- and subscripts. The bold and italics are all there, but if you look closely you’ll find a lot missing from the other fonts. The normal-weight calligraphic alphabet starts with ACDG, for example. So what happened?
The unicode consortium has principles. One principle is that a character is a symbol that someone is using now. No doubt there are related symbols that someone might use in the future, but those can be added to unicode in the future. This makes a lot of sense, because it makes for factual and objective choices about what to include.
Unicode don’t care much about complete alphabets, but a book from a major publisher that actually uses a calligraphic B in a formula will get their attention. The goal is a universal character set, not a set that works for every book except that one.
How do I know this? I was slightly involved in the creation of the math symbol block. It was an initiative of the APS (and maybe AMA and some other discipline societies). In the early 2000s, there was a notice on the Physical Review website, asking for readers to submit examples of publications that used odd symbols. I had just looked something up in Photonics by Saleh & Teich, and I had a few minutes to spare. That’s how a bunch of the calligraphic letters made it in to unicode.
I would have thought that all books which need formulas would use LaTeX anyway (or, in 1% of cases, some weird alternative), and then whether something is in Unicode is irrelevant.
As I understand it, the issue isn’t really generating the required glyphs, it’s having a place to put them so that others can access them in a standardized way.
Creating the glyphs and adding them to a Private Use Area (here I used
0xe100 and up) is easy enough: select, duplicate, rename, scale, align, engorgio —and you’ve got another hundred or so superscript and subscript characters in your font:
Perhaps 50% scaling was a bit too aggresive…
I’m assuming that the (simpler) proposal is to get standardized Unicode values for these. Making the required glyphs isn’t an issue.
Yes indeed! Generating the glyphs is a bit of scaling and a bit of aesthetic judgement regarding placement. Still, having an example font with such glyphs seems to be a requirement of the font-based version of the proposal. In this version a new block is requested for the resulting superscripts+subscripts of latin and greek characters commonly used in mathematical writing. It then becomes a matter of font support.
The other version of the proposal is to add new combining characters “mathematical superscript” and “mathematical subscript” which modify the previous character to mark it as a superscript or subscript (see https://github.com/stevengj/subsuper-proposal/issues/1). From the discussion there, it seems likely that this would need changes to font rendering software.
I think both versions of the proposal have some desirable points.
Thanks for the story, it really helps better understand why things are the way they are. You’re right I didn’t notice the missing characters within the large block of mathematical symbols in various styles
This makes sense but there’s a definite chicken and egg problem for computer languages which desire to use “plain text” for source code without inventing their own encoding. But with that in mind, I wonder what’s a good way forward here. We could go on a similar hunt for published mathematical material using these. (Or arguably create a precedent by consistently using a block from the private use area; but that’s just not very practical in many ways.)
Side note: the STIX font project (https://github.com/stipub/stixfonts) is inteteresting and appears to use the private use area for glyphs not yet in the standard.
Publish a research article using the characters to demonstrate its usage within a programming language. The advantage of publishing a paper is that such encoding does not need to exist in the programming language yet, it can be fictionally displayed in the article, for example as a code snippet with a code block featuring the characters in a hypothetical programming language usage context.
Oops, I remembered that wrong. The whole script and blackboard bold alphabets made it in, just not at contiguous code points.
That makes the nearly-complete super and subscript alphabets look a bit silly. Maybe a proposal to Unicode is all it would take.
That’s odd; perhaps just another case of a subset of those symbols being added at an earlier time in the Letterlike symbols) block.
Regarding the general use of unicode characters in programming language identifiers, I just came across the very recent technical report Unicode Identifier and Pattern Syntax which seems quite relevant to this discussion. It’s nice to see that the technical report explicitly states:
Modifier letters (General_Category=Lm) are also included in the definition of the syntax classes for identifiers. Modifier letters are often part of natural language orthographies and are useful for making word-like identifiers in formal languages.
The Lm category appears to contain the existing Latin and Greek super and subscripts.
(It turns out that vim treats the super and subscripts inconsistently with the unicode standard so I did some more research and submitted https://github.com/vim/vim/issues/5038 — if anyone is interested in how text editors are “meant to” deal with unicode identifiers by default you could have a look at that.)