Check Unicode character class

Deuxis · September 6, 2019, 3:57pm

Hello, how can I check a Char’s Unicode character class? I need it for a lexer for a unicode-enabled language:

I tracked down that Char’s show method uses Unicode.category_abbrev, but trying to import that function gives me an error and there doesn’t seem to be documentation for it anywhere.

simeonschaub · September 6, 2019, 4:08pm

The Unicode module is not exported, so you need to specify Base.Unicode.category_abbrev.

Deuxis · September 10, 2019, 9:59am

Wow, that’s weird, it doesn’t work even if I explicitly import Unicode module, but does work when I also specify the Base part like you said. Is there a reason for this behaviour? Is module Unicode somehow different than the pre-imported Base.Unicode?

simeonschaub · September 10, 2019, 10:11am

Unicode is just a submodule of Base. The reason the show method doesn’t specify it with Base.Unicode is because the code is all in the module Base, so it has access to all its submodules automatically. If you don’t want to specify Base.Unicode every time, you can also put using Base.Unicode at the top of your code.

Deuxis · September 10, 2019, 11:32am

Yes, I just thought that importing a module simply brings it into the scope, so I’m bewildered why does this happen:

kristoffer.carlsson · September 10, 2019, 11:39am

You want import .Unicode.category_abbrev. Thers is an stdlib called Unicode as well.

Topic		Replies	Views
Accessing the category of a Char General Usage question , unicode	4	312	August 13, 2023
Non-unicode versions of unicode functions in base/stdlib? Internals & Design	10	1315	May 16, 2021
Assigning a function to unicode symbols General Usage	1	365	March 13, 2020
What happened to parsing Unicode characters? VS Code question , unicode	5	238	June 20, 2025
Invalid unicode variable General Usage	3	1019	March 3, 2018

Check Unicode character class

Related topics