REPL reports (ParseError: unknown unicode character) when I copy-paste code containing unicodes to the terminal in VScode

Continuing the discussion from Issue with using Unicode in symbol names:

In my application, it’s sometimes motivated to use variant style of letters.
Maybe often the case is they can work properly (copy-paste workflow to julia REPL).
But occasionally (maybe 10% of the chance), we encounter the error as written in the title.
Here is a case

julia> JuMP.@constraint(𝙶, sum(𝚐) ≤ b)
𝚐[1] + 𝚐[2] + 𝚐[3] + 𝚐[4] + 𝚐[5] + 𝚐[6] + 𝚐[7] + 𝚐[8] + 𝚐[9] + 𝚐[10] <= 5

julia> @set_objective_function(𝙶, sm(𝙹_val, g2d(��)) - sm(𝙸_val, z))
ERROR: ParseError:
# Error @ REPL[87]:1:42
@set_objective_function(𝙶, sm(𝙹_val, g2d(��)) - sm(𝙸_val, z))
#                                        ╙ ── unknown unicode character '�'
Stacktrace:
 [1] top-level scope
   @ none:1

julia> Base.isidentifier(:𝚐)
true

You see that in the first julia> the code is accepted without error, in which I used \ttg.
But then in the second julia> it cannot identify the same symbol. (the unknown unicode character is essentially \ttg).

I wonder how can I avoid this error?

By the way, there are some nice identifiers that are very stable and hardly errors, e.g. the \o symbol. And the Greek letters. (But they are not enough for me.)

It is more than strange that immediately after this post, it work again properly.

julia> JuMP.@constraint(𝙶, sum(𝚐) ≤ b)
𝚐[1] + 𝚐[2] + 𝚐[3] + 𝚐[4] + 𝚐[5] + 𝚐[6] + 𝚐[7] + 𝚐[8] + 𝚐[9] + 𝚐[10] <= 5

julia> @set_objective_function(𝙶, sm(𝙹_val, g2d(𝚐)) - sm(𝙸_val, z))

julia> 

Just like there is a rand() function

if rand() < 0.1
    error("ParseError")
end

Where are you copying the text from? Some apps reformat the text automatically and mess up the characters (" is a common issue), but I’d expect that to be consistent, not just 10%.

Both of the “mysterious”/“unknown” characters you’ve posted seem to be:

julia> '�'
'�': Unicode U+FFFD (category So: Symbol, other)

julia> '�'
'�': Unicode U+FFFD (category So: Symbol, other)

(copy pasted from your post)

\ttg, when entered into the REPL and completed when pressing TAB, is this:

julia> '𝚐'
'𝚐': Unicode U+1D690 (category Ll: Letter, lowercase)

which is definitely not the same.

2 Likes

I use the default VScode copy-paste to REPL workflow. I encounter this issue again now, see


Do I need to someway change my settings of the text editor of VScode?

Essentially I was using this symbol.

Unfortunately from today’s experience, the probability of ParseError is way larger than 10%.
I’m using win11, and the setting of Vscode is default.

image

As a comparison, include("src/reform.jl") would be more stable. But this manner is not as convenient as the copy-paste workflow.

Besides one of those characters not being supported in Symbols, it doesn’t even seem plausible to accidentally mix up the code units. It really just changed that specific character without affecting the surroundings. Apparently it’s a “replacement character used to replace an unknown, unrecognised, or unrepresentable character,” but I can’t imagine why 𝚐 is only sometimes recognized.

julia> bitstring.(codeunits("𝚐"))
4-element Vector{String}:
 "11110000"
 "10011101"
 "10011010"
 "10010000"

julia> bitstring.(codeunits("��"))
6-element Vector{String}:
 "11101111"
 "10111111"
 "10111101"
 "11101111"
 "10111111"
 "10111101"

To clarify, do you mean executing a selected block of code with Alt+Enter? Or are you manually copying lines from the source code and manually pasting into the VS Code REPL? To be clear, either way shouldn’t change characters like this.

I also suggest changing the title of your post. You’re not trying to parse an unsupported character, a supported character really is changing sometimes when you copy and paste lines within VS Code, which would also be bad if it’s changing to another supported character. I wonder, would you run into the same issue if you paste into a REPL outside of VS Code, open in a separate terminal? I for one have tried copying and pasting the 𝚐 character around and can’t reproduce this issue no matter where the REPL is, also on Windows 11.

1 Like

It might be worth trying some of the other characters in the U1… blocks to see whether there’s a general issue with the higher Unicode numbers. A problem with just the Teletype g seems unlikely :thinking:

1 Like

No, I don’t even know what this action means.
I select a block of code, and click right key, choose “copy” in the menu. And then move to the left powershell region which is occupied by the julia REPL (see my image above), and then click right key (which is the default setting of VScode) to paste them.

I haven’t try it yet. But I’m not prepared to work anywhere else. VScode is a good environment for developing things.

I sent a r.jl file via DMS to you just now. (Is it sent successfully?)
If you use include("src/r.jl"), then it can run some results successfully.
But if you copy-paste it, then you are expected to see ERROR: ParseError:...
:white_check_mark: Note that you don’t have to really run my code since you are not supposed to have the dependent packages (e.g. Gurobi.jl). Here is a trick to reproduce, you just use quote-end to enclose some of my code. I’ve tried it. It can as well reproduce the ERROR: ParseError:... :green_book:

Therefore I think this is a bug.

That’s the thing, it shouldn’t be different from opening PowerShell (currently Windows default terminal) separately and starting a Julia REPL there.

Thanks for the tip, and it makes sense for the ParseError flagging Julia code specifically, not other text in string quotes and such. I tried it again this way, I still can’t run into the error, whether I stay inside VSCode, stay outside VSCode, or go back and forth to a separate Powershell process; I keep getting the proper 𝚐 character (as well as the other highlighted almost-ASCII characters), not the �� replacement characters. I think we might need other Windows users to look at this, this is starting to look like an encoding problem.

It’s a different julia-vscode specific way to execute selected code, has some perks but I don’t expect it to be relevant to this issue.

1 Like

It could be just an encoding issue? Make sure the terminal is set to use the Unicode/UTF-8 encoding, and not Windows-1252?

2 Likes

It’s repeatable. If I copy lines 63 to >=69 of r.jl (not the same file as original post) via either VSCode or a separate text editor, the double replacement characters replace the 2nd 𝙻 every time it’s pasted into a quote in the REPL in a Powershell process via the VSCode terminal, NOT in the REPL in a separate Powershell process or the file itself. The same effect occurs for the 2nd 𝙳 in lines 62 to >=63:

julia> quote
       𝙳 = Model("st2_dual_lMs");
       JuMP.@variable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
ERROR: ParseError:
# Error @ REPL[68]:3:16
𝙳 = Model("st2_dual_lMs");
JuMP.@variable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
#              ╙ ── unknown unicode character '�'

The effect does not occur if I select more preceding lines, though I obviously haven’t exhausted every selection possible. I’m much more confident now that this is specific to pasting in the REPL via VS Code, and it’s more deterministic than what unrepeated copies suggest.

It doesn’t strictly need a macro call:

julia> quote
       𝙳 = Model("st2_dual_lMs");
       JuMPtavariable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
ERROR: ParseError:

nor does it necessarily occur with a macro call:

julia> quote
       𝙳 = Model("st2_dual_lMs");
       @foo(𝙳, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
       end

and edits can result in a different character being replaced:

julia> quote
       𝙳 = Model("st2_dual_lMs");
       @blah(𝙳, 0 <= ��[i = 1:M] <= 𝚁[i]);
ERROR: ParseError:
2 Likes

To me this sounds like something going wrong outside of julia - can you check if the character is also replaced when pasted into a different place? Maybe your locale/windows language settings have something to do with it?

I can’t speak for WalterMadelim, but for me, it doesn’t seem like it’s about a particular ASCII-resembling character (VSCode highlights these). Major edits like copying more lines or minor edits like a name or spacing affect whether the pasting issue occurs and which character is replaced. I’m pretty sure it’s particular to pasting in the VSCode terminal, pasting just about everywhere else didn’t cause this (besides what I mentioned earlier, I also tried Microsoft Word).

I’m using English (United States), Windows 11.

I noticed that I didn’t quite write this out, so anyone else on Windows 11 can try copying the below and pasting it into a REPL (v1.11.4, v1.10.9, v1.9.4) via the VSCode Terminal:

𝙳 = Model("st2_dual_lMs");
JuMP.@variable(𝙳, 0 <= 𝙸[i = 1:M] <= 𝚁[i]); # replaces 𝙳

or

𝙳 = Model("st2_dual_lMs");
ablah(𝙳, 0 <= 𝙸[i = 1:M] <= 𝚁[i]); # replaces 𝙸

The pasting issue does not occur with minor changes like removing the semicolons, adding indentations, spacings, or a preceding quote in the copy. It’s this exact text that must be copied, and it doesn’t seem to matter what you type before the paste. Typing quote or """ just saves us UndefVarErrors if we didn’t bother with the necessary definitions and imports.

1 Like

I’m on linux, and I can reproduce this.

Pasting this into my VSCode julia repl I get this:

julia> 𝙳 = Model("st2_dual_lMs");
ERROR: UndefVarError: `Model` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> JuMP.@variable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);

Here’s my versioninfo:

Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 

Pasting it into a Julia session started from my terminal does not reproduce the error.

2 Likes

Yes I agree with this point. I find that my code r.jl—which contains numerous unicode can be run steadily in standalone windows CMD and windows PowerShell. But it produces ParseError in the terminal within VScode.

This example is precisely apt. In my VScode environment it steadily incurs

PS K:\uc24> julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.4 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> 𝙳 = Model("st2_dual_lMs");
ERROR: UndefVarError: `Model` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> JuMP.@variable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
ERROR: ParseError:
# Error @ REPL[2]:1:16
JuMP.@variable(��, 0 <= 𝙸[i = 1:M] <= 𝚁[i]);
#              ╙ ── unknown unicode character '�'
Stacktrace:
 [1] top-level scope
   @ none:1

julia> 

Your idea is plausible. But to be honest, I don’t know how to make the settings in my VScode. I browsed some websites and asked ChatGPT, yet I have no idea on how to make the configurations.

This issue seems to contain some possible solutions: How to set Integrated Terminal default encoding to UTF-8? · Issue #19837 · microsoft/vscode · GitHub

I’ve read that Github thread. All of setting 1,setting 2, setting 3 don’t work in my computer. Windows is really disquieting. I would consider opt to another OS for my next laptop :innocent:. It appears that in Julia community, there is seldom Windows users.