Cyrillic symbols in comments

When I use comments, typed in Cyrillic in a file (“hello.jl”) and load this file like that include(“hello.jl”), I have the error message: ERROR: LoadError: syntax: invalid UTF-8 syntax


println(“Hello !”) # Булгъар

It appares that the quotation marks you use are the problem, not the comments:

julia> println(“Hello !”) # Булгъар
ERROR: syntax: invalid character "“" near column 9
 [1] top-level scope at none:1

julia> println("Hello !") # Булгъар
Hello !

Use ASCII quotation marks.


Same file, but with comment works fine

println(“Hello !”) # some comment

In REPL Cyrillic comments works too

Probably you didn’t save the file in the UTF-8 encoding, but used some other encoding like UTF-16. What editor are you using?

An example in the REPL of invalid UTF-8 data can be generated by creating a string from random bytes:

julia> Meta.parse("# " * String(rand(UInt8, 10)))
ERROR: Base.Meta.ParseError("invalid UTF-8 sequence")

It does not depend on the editor, I have used several Notepad++, TED Notepad,… For example, file with such comments works

println(“Hello !”) # Հայերեն

println(“Hello !”) # სომეხური ნინა

println(“Hello !”) # میلیون نفر بۇ دیلده

Any decent editor will preserve the encoding of the file by default, so simply opening it and re-saving in another editor will not fix the encoding. You’ll have to change an editor setting somewhere to specify conversion to UTF-8. How to do this will vary with the editor.

For example, for Notepad++ see here. For TED Notepad there is an encoding option in the File menu. And so forth.

PS. I would strongly recommend using a modern programming editor like vsCode. (To change the encoding to UTF-8 in vsCode, there is a menu at the bottom of the file window.)


Well. But why this line is Ok

println(“Hello !”) # Հայերեն

and this is not

println(“Hello !”) # Булгъар

And what does the editor have to do with it?

It depends on how it is encoded in whatever encoding you are using and whether that happens to correspond to a valid UTF-8 sequence.

The editor determines what the default encoding is and how to change it, as I explained in my message above.

1 Like

Well. Did you try to make file with the text

println(“Hello !”) # Булгъар

using and run it in REPL?

Yes. It works fine (both via julia foo.jl and by include("foo.jl") in the REPL), saved in UTF-8 encoding in a file foo.jl, once you correct the quotes to straight quotes:

println("Hello !") # Булгъар

(Use a programming editor like vsCode! Non-programming editors will sometimes “smart-correct” quotes "..." into curly quotes “...”, which is not what you want for programming. Or maybe your browser is doing that to your discourse posts?)

1 Like

Thank you very much!
(I use the right quotes, probably they are converted in this window).
Yet I do not understand the idea. It is COMMENT, it is for me, not for compiler/interpreter. One can use here any symbols.
I thought, the compiler just skip the text after the symbol #.

It has to parse text after # in order to find newline symbol.

Thank you! I finally figured it out!