Learning Regular Expressions in Julia

  • I found it difficult to learn Regular Expressions in Julia.

  • I couldn’t understand what those different symbols in r"^\s*(?:#|$)" stands for ? Regular Expressions documentation is not detailed enough for many people to understand.

  • Can you suggest some lucid references to understand Regular Expressions in Julia ?

3 Likes

The regular expression syntax is not unique to any particular language, there are “dialects” of it to be sure, but they are all very similar. Some places I have found that discuss them:

  • Python’s re documentation.
  • Racket’s re documentation.
  • Vim re tutorial - note that vim has some strange dialect things going on with regular expressions.

The pattern you referenced in your question, I believe means:

  • ^: pattern starts with
  • \s*: zero or more string characters (as many as possible, or “greedy”)
  • (?:: is a non-capturing version of regular parenthesis in a regular expression. A regular parenthesis captures each group so that it can be later referenced. This particular statement looks for either a # or a $.
1 Like

Check out Regex101. I find it extremely helpful both in learning and prototyping regex.

13 Likes

You mention prototyping regular expressions: this is more important than one might think. A poorly-crafted regular expression can eat up all of your memory. Imagine a webserver reading user input …

1 Like

http://regex.info/book.html

1 Like

Thank You , But this book is lengthy. :laughing:

I highly recommend reading the first chapter. The author is very skilful at making a dry subject extremely fluid.

1 Like

You could also have a look at GitHub - jkrumbiegel/ReadableRegex.jl: regexes for people who don't really want to learn or read regexes and at the outputs of the different subfunctions to see what regexes they correspond to. Although the drawback is that it uses groups very liberally, so the output has a lot of unnecessary parentheses added (the computer doesn’t mind)

2 Likes

A little bit off-topic - and I also do not intend to set you off.

There is value in learning regex if you plan to do much work where you will use that. And if you really use a lot of regex - you’ll learn it anyway (by doing).

Otherwise, I suggest learning the basic principles and search for close-enough examples when you need to create your own. Stackoverflow is overflowing with such examples.

Spending days acquiring a skill that you will rarely use can be a waste of time (mainly because regex is not like riding a bike - stuff will slip from your memory).

But if you really want to embark on such an adventure, the others suggested pretty lovely materials.

2 Likes

You can do a lot with just text, wildcard ., unlimited times *, or [], and maybe ?.
I suggest you start with those and look up more complicated examples as needed.
image

1 Like

Mastering Regular Expressions is excellent book. It helped me understand basics but it covers many different softwares for regex.
Can you suggest some other more advanced book on regular expression compatible in JULIA ? :juliatext:

Actually, Friedl is known for doing tremendous damage to regexp with that book of his. One is sure to be left with strong misunderstandings after reading it, due to Friedl not understanding the relevant theory himself. I’d advise anyone not to read that book, because it’s prone to do more harm than good.

As an example of how Friedl butchers finite automata theory, he not only thinks that DFA isn’t equivalent to NFA, he actually doesn’t at all understand what NFA is, equating it to his own regexp implementation.

2 Likes

I love my regex joining function for building complex regex from simpler regex


#=   function regex_and
     Join two regex together. ie. regex_A followed by regex_B
=#
function regex_and(a::Regex,b::Regex)
    a_str = string(a)
    b_str = string(b)
    result_str = "(?:" * a_str * ")" *
                 "(?:" * b_str * ")"
    return Regex(result_str)
end

#=   function regex_or
     Select either one of the two regex. ie. regex_A or regex_B
=#
function regex_or(a::Regex,b::Regex)
    a_str = string(a)
    b_str = string(b)
    result_str = "(?:" *
                 "(?:" * a_str * ")" *
                 "|" *
                 "(?:" * b_str * ")" *
                 ")"
    return Regex(result_str)
end

#=   function regex_or
     Select only one element of the array of regex.
=#
function regex_or(arr::Array{Regex,1})
    flag = false
    result_str = "(?:"
    for ele in arr
        if flag == true
            result_str = result_str * "|"
        else
            flag = true
        end
        temp_str = string(ele)
        result_str = result_str * "(?:" * temp_str * ")"
    end
    result_str = result_str * ")"
    return Regex(result_str)
end
3 Likes

In addition to the other resources in this thread, I found https://regexr.com/ quite helpful to understand regexes I didn’t write.

2 Likes