Using replace with a function of the match

Hello,

It’s easy to recuperate the matched substring in replace and do something like:

julia> replace("foobar", r"b([a-z])r" => s"\1")
"fooa"

However I can’t pass the output to a function so for instance this:

julia> replace("foobar", r"b([a-z])r" => uppercase(s"\1"))
"foo\\1"

this works but is a bit silly with the double matching:

julia> rx = r"b([a-z])r"
julia> replace("foobar", rx => s -> uppercase(match(rx, s).captures[1])
"fooA"

is there an obvious way to do this better? generally what might be nice is to do something like:

replace("foobar", regex => myfun) 

where the myfun would get access to the regex match and have access to fields like captures

myfun(m) = uppercase(m.captures[1])

Thanks!

cc: @Wikunia

3 Likes

But replace("foobar", regex => myfun) already has a meaning: it replaces the matched substring (i.e. match(rx, "foobar").match) by the result of applying myfun to it (if the matched string exists).

Using this interface to apply the function to the matched string makes sense, and is simpler to reason about. E.g., what should be done if there are various captured substrings? Maybe apply myfun to all? Ok, that might make sense.

Now then, what if the captured substrings are nested into each other? Well, uppercase may be an easy one for this, but there may be others less clear, e.g. strip, etc.

3 Likes

So I agree that the change I’m suggesting is breaking and so would not happen between 2.0 if at all.

Note that I disagree it’s ambiguous; the current behaviour in “my” perspective amounts to passing m.match; if you have access to m you have strictly more information than in the current case (you have the match, you have the groups for the SubstitutionString and you could apply a function to any of these things).

I might just copy the code from replace and implement a replace_transform or something like it.

PS: I hope it’s clear that the uppercase example is just a toy example to clarify my use case.

2 Likes

It’s not impossible to allow for something like this in a non-breaking way. You could introduce a wrapper SubstitutionFunction, which would wrap a function and if used on the right hand side of replace together with a regex, it gets passed the matches instead of the whole string.

5 Likes

Opened https://github.com/JuliaLang/julia/issues/36293 for further discussion, depending on feedback I’ll give it a shot

2 Likes