Accented characters in vscode

In VScode, if I type θ \hat tab then the hat comes after the θ but the variable is usable.

θ̂ = 2.0

If instead I type \hat tab θ I get the hat sitting on top of the θ but Julia gives an error.

This issue is not specific to Greek characters: the same happens with x.

Is there a setting in VScode that I could change to make it both look right and not produce an error message? (I’ve looked but failed to find) Thanks!

What font are you using? If I type x \hat it produces the correct unicode character. This might seem like a font issue to me.

2 Likes

Thanks! (‘Droid Sans Mono’, ‘monospace’, monospace, ‘Droid Sans Fallback’) which I suspect is the default. I’ll experiment, but what is the font that you are using @affans ?

That definitely is a font issue. I’d recommend giving the excellent JuliaMono a shot. :slight_smile:

6 Likes

JuliaMono seems to be a fan favourite, but I use Cascadia.

1 Like

You might find this useful: https://mono-math.netlify.app/ :slight_smile: It shows how different monospace fonts compare for math symbols.

2 Likes

θ \hat should work, but I don’t think \hat θ will, because hat needs a character to sit on.

You can stack these too, so x \dot \vec should work (in the VS Code editor), font permitting.

2 Likes

Realize that this is not really LaTeX. You are not typing \hat{\theta} and having it typeset.

What is actually happening is that tab-completing \hat gives you the Unicode combining character U+0302 (combining circumflex accent). The way combining characters work in Unicode is that they modify the preceding character, in this cast to put a ^ over the preceding character. So, you have to put the \hat after the θ if you want a θ with a hat on top of it. (Though it may not render correctly if your font doesn’t support this combination.)

In addition to not rendering the way you want, putting the \hat first is not a valid Julia variable name — valid variable names must not begin with a combining character.

7 Likes

Hi, I’m a bit late to this conversation, but I still have some questions. I had the same problem as OP with Droid Sans Mono, and I’ll try out some other fonts.

What I don’t understand is why Droid Sans Mono would do this. The Unicode specification is that combining characters combine with the previous character, so why would they ever combine with the next character?

Based on the little testing I’ve done, when you have a greek letter, a combining character, then another character, the middle character combines with the last character, even if it’s also a greek letter. If the first character is an english letter then it will combine properly. I’ve also noticed that punctuation behaves similarly to greek letters in the above examples.

I think this massive thread provides enough evidence to discourage the use of unicode characters in identifiers in computer code.
Example?

julia> function f!(M)
       Μ = [1, 2]
       end
f! (generic function with 1 method)

julia> M = [3, 4]
2-element Vector{Int64}:
 3
 4

julia> println(M)
[3, 4]

julia> f!(M)
2-element Vector{Int64}:
 1
 2

julia> println(M)
[3, 4]

julia>

How do you like 'em unicodes now?

lol - But I don’t think this argument is very persuasive. We don’t ban scissors merely because it’s possible to cut yourself with them. Instead we teach and advise how to use things safely, just like we would advise programmers not to use easily-confused capital Greek letters for variable names (such as Μ, U+039C) – without at least documenting their folly!

2 Likes

I started investigating this, but got a bit bogged down and wasn’t able to come up with a definitive explanation.

I made this graphic in an attempt to explain what was going on:

Basically this works through a number of monospaced fonts and draws the U+0302 combining circumflex accent from each one. The central red cross is essentially the current position or ‘cursor’ - usually the bottom left corner of a typical character. You can see that fonts generally fall into two groups. The first group place the circumflex to the left of the cursor, ie over the previous glyph, and the second group position the circumflex to the right, ie over the next glyph. There are a few fonts that appear to do neither. Not shown are some fonts that don’t have this character at all - Anonymous Pro, Roboto Mono, JetBrains, etc…

The numbers are the X-advance value of this glyph - which in theory defines the width of the character. For the first group, because the width is 0, the next character appears at the same place - there’s effectively no advance forward. (I think this corresponds to what the Unicode specification says.) In the second group, there is an X-advance value, so these are - in theory at least, not zero-width glyphs. But, to make matters more complicated, the text rendering software (ie your terminal or editor, or the OpenType text-rendering layer) may or may not take notice of the varying X-advance values - the “monospaced context” overrides any positioning instructions.

The problem starts, of course, back when the font is designed. Some font design applications set the width of these combining characters to zero, even though the graphics obviously have some width. Other programs allow these characters to have non-zero widths…

I think the group on the left provides the correct behaviour. But I’m not 100% sure.

Well, I got this far, then kind of gave up. Fonts are tricky… :rofl:

3 Likes

This is patently wrong: there are thousands of mathematical libraries that contain not a single unicode character. What makes them legible (or not) is not the characters, it is the software design.

1 Like

Again, not impossible. It has been done many times already. BTW: if you say it is crucial to employ unicode, you should be able to point to at least one library that demonstrates this. Can you?

You claimed it was impossible. You prove it.

Hence it is possible. QED

Listen. I guess you’re not that used to tongue-in-cheek hyperbole. I’m clearly being over-the-top and deliberately irrational.

My point is, I’ve written a bunch of mathematical codes without unicode, and it really, really sucks. When I am able to bring in a sprinkling of Greek letters, and maybe some other decorations, it cleans the code up dramatically. It’s a real life-saver, it doesn’t just make the code easier to understand, and much more concise, it also reduces typos and bugs, even mathematical errors.

As far as I’m concerned, unicode symbols in computer code is the best invention in the history of the universe, and I won’t let anyone tell me they’re a bad idea.

1 Like

I have written a lots of code with math, and I never needed a unicode character.
Introducing characters which look the same but have different meanings into the code
is a bad idea, period.

1 Like

Yes you did, you just didn’t realize it :wink:

Doing that would be terrible. But your example didn’t demonstrate this, probably because you forgot to mutate inside f!.

Anyway, just for fun, I’ll prove my assertion.

Definition: Maths-heavy code is code that is unreadable without unicode symbols.

It follows that one cannot write readable maths-heavy code without unicode.

This thread was about accented characters.:frowning_with_open_mouth:

2 Likes