Packages for text matching?

I’d like to write:

"$(First_Name).$(Last_Name),$(Age)" = "John.Smith,20"

with result

(First_Name = "John", Last_Name = "Smith", Age = 20 )

Any package that supports this pattern matching in text?

This does not directly address your question, but in case you have not considered regex capture groups yet:
m=match(r"(?<First_Name>\w+)\.(?<Last_Name>\w+),(?<Age>\d+)", "John.Smith,20")

3 Likes

Thanks Andreas.
That’s a great answer. I wish Regex was less verbose. But I guess it has to be to capture the many possibilities.

Its nice you can then write m[:Last_Name]
Seems you can’t convert m to a dict or named tuple though which is a pity.

But you can, can’t you?

julia> m=match(r"(?<First_Name>\w+)\.(?<Last_Name>\w+),(?<Age>\d+)", "John.Smith,20")
RegexMatch("John.Smith,20", First_Name="John", Last_Name="Smith", Age="20")

julia> Dict(keys(m) .=> m.captures)
Dict{String, SubString{String}} with 3 entries:
  "First_Name" => "John"
  "Last_Name"  => "Smith"
  "Age"        => "20"

julia> NamedTuple{Tuple(Symbol.(keys(m)))}(m.captures)
(First_Name = "John", Last_Name = "Smith", Age = "20")

Not sure of understanding but this sounds like a meta-programming task.
Example below:

str = """First_Name, Last_Name, Age = "John", "Smith", 20 """
eval(Meta.parse(str))

resulting in assignments:

julia> First_Name
"John"
julia> Last_Name
"Smith"
julia> Age
20

FWIW: Parse structured string to dictionary

The Python solution is nice and easy. Would be great to have this in a package!

Update:

The example of the OP would just be this in Python:

import parse
templ = "{First_Name}.{Last_Name}.{Age}"
dict = parse.parse(templ, "John.Smith,20")

It has few problems.

  1. It’ll work only in global scope, which may not be convenient
  2. It doesn’t scale well. What if you have file of strings to parse?
  3. It’s slow.
3 Likes

keys(m) doesn’t work for me. I think because m isn’t a dictionary. Am I missing a package ?

Probably it’s Julia version? Which one are you using?

1.6.0

This Python parse package seems really cool thanks Carsten.
Pity about the issues Andrey mentions.

Probably it was added only in 1.7

Then you can do this

julia> m=match(r"(?<First_Name>\w+)\.(?<Last_Name>\w+),(?<Age>\d+)", "John.Smith,20")
RegexMatch("John.Smith,20", First_Name="John", Last_Name="Smith", Age="20")

julia> kd = Base.PCRE.capture_names(m.regex.regex)
Dict{Int64,String} with 3 entries:
  2 => "Last_Name"
  3 => "Age"
  1 => "First_Name"

julia> map(eachindex(m.captures)) do i
         get(kd, i, i) => m.captures[i]
       end |> Dict
Dict{String,SubString{String}} with 3 entries:
  "First_Name" => "John"
  "Last_Name"  => "Smith"
  "Age"        => "20"

julia> map(eachindex(m.captures)) do i
         Symbol(get(kd, i, i)) => m.captures[i]
       end |> x -> (; x...)
(First_Name = "John", Last_Name = "Smith", Age = "20")

It should work on all versions from 1.3 to 1.7. Probably it works on 1.0, just hasn’t test it, since do not have 1.0 available.

Adding keys(::RegexMatch) is indeed 1.7

Also in 1.7 we added iterate(::RegexMatch) (rather than neededing to do m.captures)

2 Likes

Will upgrade:)

1.7 hasn’t been released yet, and is many months away from releasing.
Compat.jl might have these functions.
if not you can open an issue (or a PR even) to port them across from those JuliaLang/Julia PRs.

1 Like

Thanks