Let’s say I have a string which has the following template structure:
"a_{a}_b_{b}_c_{c}.{txt}.{ext}"
Example: “a_3_b_4_c_5.whatever.jld”
My question is how to conveniently generate such string from given (string) values for a, b, c, txt
and ext
and (more importantly) how to parse them back into a dict with keys a,b,c,txt
and ext
.
In Python, this is how I would do it:
import parse
templ = "a_{a}_b_{b}_c_{c}.{txt}.{ext}"
# generating
a = "3"
b = "4"
c = "5"
txt = "whatever"
ext = "jld"
s = templ.format(a=a, b=b, c=c, txt=txt, ext=ext)
# s == 'a_3_b_4_c_5.whatever.jld'
# parsing
dict = parse.parse(templ, s)
# dict == <Result () {'a': '3', 'c': '5', 'txt': 'asd', 'b': '4', 'ext': 'txt'}>
dict["a"] # == "3"
dict["ext"] # == "jld"
In Julia:
# generating
a = "3"
b = "4"
c = "5"
txt = "whatever"
ext = "jld"
s = "a_$(a)_b_$(b)_c_$(c).$(txt).$(ext)"
# parsing
# this is my question
How do do the parsing part nicely? Of course I could manually use split
etc. but this seems rather undconvenient if you have to do it more often and also it doesn’t generalize. Is there a package for this?
I think that split
is a very convenient solution, eg
julia> function parse1(s)
pairs, txt, ext = split(s, ".")
dict = Dict(Iterators.partition(split(pairs, "_"), 2))
dict, txt, ext
end
parse1 (generic function with 1 method)
julia> s = "a_3_b_4_c_5.whatever.jld"
"a_3_b_4_c_5.whatever.jld"
julia> parse1(s)
(Dict("c"=>"5","b"=>"4","a"=>"3"), "whatever", "jld")
Thanks for your answer. However, you encode the template information in your parse1
function, which I would take as unconvenient compared to the nice python version above.
As a consequence, it also doesn’t generalize nicely. Imagine another string with completely different structure. You would have to redefine a version of parse1
all the time, while in python I just define the new template (just one - very natural - line).
I see what you want now, sorry I did not get it the first time.
I am not aware of a package that does this. However, you should be able to do the following fairly easily:
- write a function that parses a template to a regexp and a vector of keys for each position,
- wrap it in a structure,
- define a function that uses the regexp to capture the matches, then generate the dictionary.
If you need help, please ask here.
1 Like
This is rudimentary and could use some refinements, but basically works:
struct TemplateParser
pattern::Regex
names::Vector{Symbol}
end
function Base.parse(tp::TemplateParser, s)
m = match(tp.pattern, s)
m == nothing && error("no match")
Dict(zip(tp.names, m.captures))
end
function escape_regex(s) # NOTE probably could use some work
e = ""
for c in s
if c ∈ ['.', '*', '\\']
e *= '\\'
end
e *= c
end
e
end
macro templateparser(s)
s.head == :string || error("Use a string expression with interpolation")
names = Vector{Symbol}()
pattern = ""
for arg in s.args
if arg isa String
pattern *= escape_regex(arg)
elseif arg isa Symbol
pattern *= "(.*)"
push!(names, arg)
end
end
TemplateParser(Regex(pattern), names)
end
t = @templateparser "a_$(a)_b_$(b)_c_$(c).$(txt).$(ext)"
parse(t, "a_3_b_4_c_5.whatever.jld")
No doubt it has horrible corner cases
2 Likes
Thanks! The macro is nice and clever! This was my attempt (I’m really really bad at string parsing/regexp etc.)
function myparse(tmpl::String, s::String)
tmp = s
kwds = matchall(r"(?<={).+?(?=})", tmpl)
splits = [split(s, "}")[end] for s in split(tmpl, "{")]
if splits[end] == ""
splits[end] = "."
tmp *= "."
end
vals = Vector{String}(length(splits)-1)
for k in 1:length(splits)-1
tmp = tmp[searchindex(tmp, splits[k])+length(splits[k]):end]
vals[k] = split(tmp, splits[k+1])[1]
end
return Dict(zip(kwds, vals))
end
I wanted to point out for the sake of completion that the package DrWatson has a function parse_savename
that does something similar. See Naming Simulations · DrWatson.