Regex: ERROR: PCRE error: requested value is not set

task: apply regex to a multi-line string each like only some line contain a custom tag
only apply a substitution to the line that contain that field, and blank out all the other lines (data extraction)

current regex

.*label="([^"]*?)".*|.*
replace(s, r""".*aria-label="([^"]*?)".*|.*"""m => s"\1")

partial (simple) method is to ignore the missed capture object, and to return blank for that,
and since the half after the “or” is listed second, it blank out the line,

usability simplicity may be factors

how to i do the replace for all occurrences in the string

reference:

  replace(s::AbstractString, pat=>r; [count::Integer])

  Search for the given pattern pat in s, and replace each occurrence with r. If count is provided, replace at most count occurrences. pat may be a single character, a vector or a set of characters, a string, or a regular expression. If r is a function, each occurrence is
  replaced with r(s) where s is the matched substring (when pat is a Regex or AbstractString) or character (when pat is an AbstractChar or a collection of AbstractChar). If pat is a regular expression and r is a SubstitutionString, then capture group references in r are replaced
  with the corresponding matched text. To remove instances of pat from string, set r to the empty String ("").

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> replace("Python is a programming language.", "Python" => "Julia")
  "Julia is a programming language."
  
  julia> replace("The quick foxes run quickly.", "quick" => "slow", count=1)
  "The slow foxes run quickly."
  
  julia> replace("The quick foxes run quickly.", "quick" => "", count=1)
  "The  foxes run quickly."
  
  julia> replace("The quick foxes run quickly.", r"fox(es)?" => s"bus\1")
  "The quick buses run quickly."

You might want to add an example string with desired output for people to suggest ideas but note that optional capture groups are not properly substituted when they don’t match (see this bug: https://github.com/JuliaLang/julia/issues/31456).

What you might want to do is have an if statement checking if there’s a match and then do the replace (yes it’s not as nice but it’s usually pretty fast anyway)

example

<div A_ID="A_attr.1">
<div B1_tag="B1_attr.1">B1_in</div>
<div B2_tag="B2_attr.1">B2_in</div></div>
<div A_ID="A_attr.2">
<div B1_tag="B1_attr.2">B1_in</div>
<div B2_tag="B2_attr.2">B2_in</div></div>
<div A_ID="A_attr.3">
<div B1_tag="B1_attr.3">B1_in</div>
<div B2_tag="B2_attr.3">B2_in</div></div>
replace(s, r""".*B1_tag="([^"]*?)".*|.*"""m => s"\1")

B1_attr.1


B1_attr.2


B1_attr.3

note that line positioning is preserved

Atom Screenshot (below)
image
Text editors (although they have differences in other places) Atom, VS-Code , Sublime Text
all replace missed captures with blanks

I appreciate the reply

also: the line positioning in this above case is used later

s = """
   <div A_ID="A_attr.1">
   <div B1_tag="B1_attr.1">B1_in</div>
   <div B2_tag="B2_attr.1">B2_in</div></div>
   <div A_ID="A_attr.2">
   <div B1_tag="B1_attr.2">B1_in</div>
   <div B2_tag="B2_attr.2">B2_in</div></div>
   <div A_ID="A_attr.3">
   <div B1_tag="B1_attr.3">B1_in</div>
   <div B2_tag="B2_attr.3">B2_in</div></div>
   """

io = IOBuffer()
for line in split(s)
   m = match(r""".*B1_tag="([^"]*?)".*""", line)
   if !isnothing(m)
      write(io, m.captures[1])
   else
      write(io, "\n")
   end
end
println(String(take!(io)))

Gives




B1_attr.1




B1_attr.2




B1_attr.3


1 Like

Thanks so much!

1 Like