Check for letters-only string

Python can check whether a string contain only letters with isalpha():

a_few = “abcdefghijk”
a_few.isalpha()
True

mixed = “abc45defg”
mixed.isalpha()
False

I have found Julia’s isletter() but it is limited to checking a single character.
How can Julia check for letters-only string?

You can apply isletter to all the characters in a string like so:

julia> all(isletter, test)
false

julia> a_few = "asfeaer"
"asfeaer"

julia> mixed = "adasd34asda"
"adasd34asda"

julia> all(isletter, a_few)
true

julia> all(isletter, mixed)
false
4 Likes

Thank you!!

Works well for what I was looking for.

I am not sure how much you care about performance or unicode-ness, but this will check if the letters are a to z and it’s around 4x faster that the more mature version:

julia> text = "a" ^ 1_000_000;

julia> @btime all(isletter, $text)
  6.489 ms (0 allocations: 0 bytes)
true

julia> function isatoz(text)
           all(c -> 0x61 <= UInt8(c) <= 0x7A, text)
       end
isatoz (generic function with 1 method)

julia> @btime isatoz($text)
  1.774 ms (0 allocations: 0 bytes)
true

Very interesting.

In the particular case I was working on speed does not count but I like the transparency and preciseness of what you are suggesting here. I don’t know what is under the hood of isletter(). The function you define here is not leaving much room for doubt.

Thank you too!!

1 Like

You can try @edit isletter('a') in the Julia REPL, it should open up your default editor with the sources of that method. It’s very helpful to see what’s under the hood.

The difference is that isletter not only detects letters A-z, but unicode letters from other alphabets as well, for example accented letters, umlauts or greek letters… Naturally, this is quite a bit more complicated, but it’s important if you want to support languages other than English.

5 Likes

I checked what @tamasgal suggested —>> @edit isletter(‘a’) to see its structure. Did not know of course, that this is available.
I can see how isletter() will pick up letters that are not English as you mention here. This distinction makes it not proper for the application I was working on. I should have instead define a function like @tamasgal demonstrated because the user input it handles must be English letters and isletter() was used in a data validation step to verify that it is … an English letter.

Appreciate very much the valuable input from you all.
Thanks again!

Note that @tamasgal’s example only detects lowercase letters. To also detect uppercase letter, you also need to check against 0x41 through 0x5a. It’s probably also easier to understand if you write this as 'a' <= c <= 'z' || 'A' <= c <= 'Z', so you don’t convert toUInt8 first and have to remember all the ASCII codes, since you can just compare Chars directly.

10 Likes

Excellent point. I passed lowercase() on the user input to avoid this problem but writing it like you do removes this concern. Using the actual characters instead of their corresponding codes is clearer to read.
If I had to choose ASCII code or character the next time I need it I would look for a drive or certain circumstances that prefer one over the other. I don’t know what it can be if any.