Separating Numbers from Letters (from anything else) using Regex

I’m trying to incorporate regular expressions on some of my pattern matching functions, but I’m still having a hard time figuring out how to use them to separate my strings.

Starting off easily, I want to separate numbers from characters in strings. So,
1a2b3c becomes ["1", "a", "2", "b", "3", "c"] and
123abc456cde becomes ["123", "abc", "456", "cde"]

Then, I want to group everything that are not numbers or letters in their own substring
12a+)3b would become ["12", "a", "+)", "3", "b"]
Thankfully, with spaces being part of the symbol group, but there’s no problem if (because of regex) they have to be in the letters group.

I know I can separate the strings by checking letter by letter and checking if the current letter is the same as the last letter, but I wanted to use the powerful regular expressions and match instead of loops.

Thanks a lot in advance.

With a regex such as rg = r"([[:alpha:]]+|[[:digit:]]+|(?:[[:punct:]]|[[:blank:]])+)", you can separate the runs of like characters. Here I’m using character classes, but you could also replace those with more specific lists if you know what will and won’t appear in your text. Additionally, whitespace is treated separately from letters among the character classes, so you have the choice of grouping the whitespace with the letters or with the punctuation. (There is also the [[:space:]] class which matches space, tab, and newlines.) The website https://regex101.com/ is a great tool for writing and testing regex’s.

str1 = "1a2b3c"
str2 = "123abc456cde"
str3 = "12a +) 3b"

(Notice I added some spaces to str3 to demonstrate the grouping of spaces.)

julia> collect(eachmatch(rg, str1))
6-element Vector{RegexMatch}:
 RegexMatch("1", 1="1")
 RegexMatch("a", 1="a")
 RegexMatch("2", 1="2")
 RegexMatch("b", 1="b")
 RegexMatch("3", 1="3")
 RegexMatch("c", 1="c")

julia> collect(eachmatch(rg, str2))
4-element Vector{RegexMatch}:
 RegexMatch("123", 1="123")
 RegexMatch("abc", 1="abc")
 RegexMatch("456", 1="456")
 RegexMatch("cde", 1="cde")

julia> collect(eachmatch(rg, str3))
5-element Vector{RegexMatch}:
 RegexMatch("12", 1="12")
 RegexMatch("a", 1="a")
 RegexMatch(" +) ", 1=" +) ")
 RegexMatch("3", 1="3")
 RegexMatch("b", 1="b")
4 Likes

Thanks a lot!

I was cracking my head over alternatives before I read your answer and a friend pointed me to
(?<=\d)(?=\D)|(?<=\D)(?=\d)
as the regular expression. It does work for matching, but
split(myString, r"(?<=\d)(?=\D)|(?<=\D)(?=\d)") offers a pretty clean way to separate numbers from other characters.
This is useful if I simply want to separate numbers from letters, but your proposal is more complete to take into consideration symbols as a separate class.

I wonder if writing like :digit makes it more readable than \d. One is longer, but clearer, whereas the other makes the expression shorter.

1 Like