Cyrillic symbols in comments

When I use comments, typed in Cyrillic in a file (“hello.jl”) and load this file like that include(“hello.jl”), I have the error message: ERROR: LoadError: syntax: invalid UTF-8 syntax

Example:

println(“Hello !”) # Булгъар

It appares that the quotation marks you use are the problem, not the comments:

julia> println(“Hello !”) # Булгъар
ERROR: syntax: invalid character "“" near column 9
Stacktrace:
 [1] top-level scope at none:1

julia> println("Hello !") # Булгъар
Hello !

Use ASCII quotation marks.

3 Likes

Same file, but with comment works fine

println(“Hello !”) # some comment

In REPL Cyrillic comments works too

Probably you didn’t save the file in the UTF-8 encoding, but used some other encoding like UTF-16. What editor are you using?

An example in the REPL of invalid UTF-8 data can be generated by creating a string from random bytes:

julia> Meta.parse("# " * String(rand(UInt8, 10)))
ERROR: Base.Meta.ParseError("invalid UTF-8 sequence")
2 Likes

It does not depend on the editor, I have used several Notepad++, TED Notepad,… For example, file with such comments works

println(“Hello !”) # Հայերեն

println(“Hello !”) # სომეხური ნინა

println(“Hello !”) # میلیون نفر بۇ دیلده

Any decent editor will preserve the encoding of the file by default, so simply opening it and re-saving in another editor will not fix the encoding. You’ll have to change an editor setting somewhere to specify conversion to UTF-8. How to do this will vary with the editor.

For example, for Notepad++ see here. For TED Notepad there is an encoding option in the File menu. And so forth.

PS. I would strongly recommend using a modern programming editor like vsCode. (To change the encoding to UTF-8 in vsCode, there is a menu at the bottom of the file window.)

3 Likes

Well. But why this line is Ok

println(“Hello !”) # Հայերեն

and this is not

println(“Hello !”) # Булгъар

?
And what does the editor have to do with it?

It depends on how it is encoded in whatever encoding you are using and whether that happens to correspond to a valid UTF-8 sequence.

The editor determines what the default encoding is and how to change it, as I explained in my message above.

1 Like

Well. Did you try to make file with the text

println(“Hello !”) # Булгъар

using and run it in REPL?

Yes. It works fine (both via julia foo.jl and by include("foo.jl") in the REPL), saved in UTF-8 encoding in a file foo.jl, once you correct the quotes to straight quotes:

println("Hello !") # Булгъар

(Use a programming editor like vsCode! Non-programming editors will sometimes “smart-correct” quotes "..." into curly quotes “...”, which is not what you want for programming. Or maybe your browser is doing that to your discourse posts?)

1 Like

Thank you very much!
(I use the right quotes, probably they are converted in this window).
Yet I do not understand the idea. It is COMMENT, it is for me, not for compiler/interpreter. One can use here any symbols.
I thought, the compiler just skip the text after the symbol #.

It has to parse text after # in order to find newline symbol.

Thank you! I finally figured it out!