It’s easy to recuperate the matched substring in
replace and do something like:
julia> replace("foobar", r"b([a-z])r" => s"\1")
However I can’t pass the output to a function so for instance this:
julia> replace("foobar", r"b([a-z])r" => uppercase(s"\1"))
this works but is a bit silly with the double matching:
julia> rx = r"b([a-z])r"
julia> replace("foobar", rx => s -> uppercase(match(rx, s).captures)
is there an obvious way to do this better? generally what might be nice is to do something like:
replace("foobar", regex => myfun)
myfun would get access to the regex match and have access to fields like
myfun(m) = uppercase(m.captures)
replace("foobar", regex => myfun) already has a meaning: it replaces the matched substring (i.e.
match(rx, "foobar").match) by the result of applying
myfun to it (if the matched string exists).
Using this interface to apply the function to the matched string makes sense, and is simpler to reason about. E.g., what should be done if there are various captured substrings? Maybe apply
myfun to all? Ok, that might make sense.
Now then, what if the captured substrings are nested into each other? Well,
uppercase may be an easy one for this, but there may be others less clear, e.g.
So I agree that the change I’m suggesting is breaking and so would not happen between 2.0 if at all.
Note that I disagree it’s ambiguous; the current behaviour in “my” perspective amounts to passing
m.match; if you have access to
m you have strictly more information than in the current case (you have the match, you have the groups for the
SubstitutionString and you could apply a function to any of these things).
I might just copy the code from
replace and implement a
replace_transform or something like it.
PS: I hope it’s clear that the
uppercase example is just a toy example to clarify my use case.
It’s not impossible to allow for something like this in a non-breaking way. You could introduce a wrapper
SubstitutionFunction, which would wrap a function and if used on the right hand side of
replace together with a regex, it gets passed the matches instead of the whole string.
Opened https://github.com/JuliaLang/julia/issues/36293 for further discussion, depending on feedback I’ll give it a shot