Match.jl: Getting hold of regex captures? (`\1` doesn't seem to work.)

In general, is it possible to get hold of regex captures outside the RegexMatch object? In particular, can we get them in a pattern match of Match.jl

This is what I have found so far:

using Match

function matchreg(s)
  @match s begin
    r"abc(\d+)" => println(s"\1") # capture doesn't work
    r"xyz(\d+)" => println(replace(s, r"xyz(\d+)" => s"\1"))
  end
end

matchreg("abc01") # => "\1" . . . doesn't work
matchreg("xyz02") # => "02" . . . works

In the successful example above, you need to carry out the same regex match on both sides of =>.

Using explicit RegexMatch objects is more tedious and less readable, if you have to branch on what the string matches with, where Match.jl shines.

In Ruby and Perl, you have the global variables $1, $2, . . . , which are the most recent captures.

1 Like

Under the Regular Expressions subheading in the docs, it says:

Match.jl used to have complex regular expression handling, permitting the capturing of matched subpatterns. We are considering adding that back again.

So it looks like this feature isn’t available (yet).

Internally, the match seems to be checked using the occursin function anyway, so there’s only a boolean return value even in the macro expansion, no RegexMatch object.

And afaik, the answer to “In general, is it possible to get hold of regex captures outside the RegexMatch object?” is a no in Julia. Julia’s philosophy is to generally avoid global state mutation and keep changes local - which generally leads to more readable, manageable code, but can be frustrating in cases like this.

If you want to avoid the double matching on the regexes, one option is to implement your own simple matching code, for eg.:

julia> function matchreg2(s)
           regexactions = Dict(
               r"abc(\d+)"       => m -> println(m[1]),
               r"xyz(?<num>\d+)" => m -> println(m["num"]),
               r"pat(\d+)tern"   => m -> begin
                   n = "|" * m[1] * "|"
                   println(n)
               end
           )

           for r in keys(regexactions)
               m = match(r, s)
               isnothing(m) && continue
               regexactions[r](m)
               break
           end
       end
matchreg2 (generic function with 1 method)

julia> matchreg2("xyz51")
51

julia> matchreg2("pat42tern")
|42|

Note that you don’t really need Match here, Julia has a builtin way to check for a regex match and use the result if successful:

function matchreg(s)
   isnothing(local m = match(r"abc(\d+)", s)) || return println(m[1])
   isnothing(match(r"xyz(\d+)", s)) || return println(replace(s, r"xyz(\d+)" => s"\1"))
end

This doesn’t help with your second part (replace(...)), not sure what improvements can be done there.

Thank you all for the helpful discussion.

Here, I digress. Since this is a digression, I don’t argue for anything.

Right. That’s why I was surprised to see that Julia’s Plots module has hidden states so that you can plot a line with plot(x,y) and then add another line with plot!(x2,y2). I don’t like this style at all and so I use explicit objects as in p = plot(x,y); plot!(p, x2,y2).

But then, I thought that the Regex module could have hidden states, too. I remember reading in the well known textbook on the Ruby language a teasing statement like, Admit, you still use $1 (the global variable that stores the most recent 1st regex capture), do you? This is funny because even though Ruby modeled its regex facility on Perl, it recommends the use of m = regexmatchfunction style like Julia, staying away from the global variables.

I’m with you about Plots. For some reason, this seems to be the general trend among packages to do with visual display - I’ve never been able to get into Javis.jl, despite being quite interested in this kind of visualization and animation, mainly because it works via this kind of hidden global state. Maybe the thought is that the figure (for Plots)/canvas (for Javis) makes for an “obvious” common state that it isn’t worth passing it around, but at least to me, it makes the code much harder to understand and reason about.

I remember reading in the well known textbook on the Ruby language a teasing statement like, Admit, you still use $1 (the global variable that stores the most recent 1st regex capture), do you? This is funny because even though Ruby modeled its regex facility on Perl, it recommends the use of m = regexmatchfunction style like Julia, staying away from the global variables.

That reminds me of the “modern Perl” efforts that also pushed for the “safe, right way” of doing things, but constantly coming up against the fact that you can’t take too much of the convenience away, especially when people are so used to them. I don’t think they went so far as to recommend against $1 and such, but Ruby being a more actively developed language with more new devs picking it up, can probably afford to push harder for the modern way of doing things.