Parse structured string to dictionary



Let’s say I have a string which has the following template structure:

Example: “a_3_b_4_c_5.whatever.jld”

My question is how to conveniently generate such string from given (string) values for a, b, c, txt and ext and (more importantly) how to parse them back into a dict with keys a,b,c,txt and ext.

In Python, this is how I would do it:

import parse
templ = "a_{a}_b_{b}_c_{c}.{txt}.{ext}"

# generating
a = "3"
b = "4"
c = "5"
txt = "whatever"
ext = "jld"
s = templ.format(a=a, b=b, c=c, txt=txt, ext=ext)
# s == 'a_3_b_4_c_5.whatever.jld'

# parsing
dict = parse.parse(templ, s)
# dict == <Result () {'a': '3', 'c': '5', 'txt': 'asd', 'b': '4', 'ext': 'txt'}>
dict["a"] # == "3"
dict["ext"] # == "jld"

In Julia:

# generating
a = "3"
b = "4"
c = "5"
txt = "whatever"
ext = "jld"
s = "a_$(a)_b_$(b)_c_$(c).$(txt).$(ext)"

# parsing
# this is my question

How do do the parsing part nicely? Of course I could manually use split etc. but this seems rather undconvenient if you have to do it more often and also it doesn’t generalize. Is there a package for this?


I think that split is a very convenient solution, eg

julia> function parse1(s)
           pairs, txt, ext = split(s, ".")
           dict = Dict(Iterators.partition(split(pairs, "_"), 2))
           dict, txt, ext
parse1 (generic function with 1 method)

julia> s = "a_3_b_4_c_5.whatever.jld"

julia> parse1(s)
(Dict("c"=>"5","b"=>"4","a"=>"3"), "whatever", "jld")


Thanks for your answer. However, you encode the template information in your parse1 function, which I would take as unconvenient compared to the nice python version above.

As a consequence, it also doesn’t generalize nicely. Imagine another string with completely different structure. You would have to redefine a version of parse1 all the time, while in python I just define the new template (just one - very natural - line).


I see what you want now, sorry I did not get it the first time.

I am not aware of a package that does this. However, you should be able to do the following fairly easily:

  1. write a function that parses a template to a regexp and a vector of keys for each position,
  2. wrap it in a structure,
  3. define a function that uses the regexp to capture the matches, then generate the dictionary.

If you need help, please ask here.


This is rudimentary and could use some refinements, but basically works:

struct TemplateParser

function Base.parse(tp::TemplateParser, s)
    m = match(tp.pattern, s)
    m == nothing && error("no match")
    Dict(zip(tp.names, m.captures))

function escape_regex(s)        # NOTE probably could use some work
    e = ""
    for c in s
        if c ∈ ['.', '*', '\\']
            e *= '\\'
        e *= c

macro templateparser(s)
    s.head == :string || error("Use a string expression with interpolation")
    names = Vector{Symbol}()
    pattern = ""
    for arg in s.args
        if arg isa String
            pattern *= escape_regex(arg)
        elseif arg isa Symbol
            pattern *= "(.*)"
            push!(names, arg)
    TemplateParser(Regex(pattern), names)

t = @templateparser "a_$(a)_b_$(b)_c_$(c).$(txt).$(ext)"

parse(t, "a_3_b_4_c_5.whatever.jld")

No doubt it has horrible corner cases :smile:


Thanks! The macro is nice and clever! This was my attempt (I’m really really bad at string parsing/regexp etc.)

function myparse(tmpl::String, s::String)
	tmp = s
	kwds = matchall(r"(?<={).+?(?=})", tmpl)
	splits = [split(s, "}")[end] for s in split(tmpl, "{")]
	if splits[end] == ""
		splits[end] = "."
		tmp *= "."

	vals = Vector{String}(length(splits)-1)
	for k in 1:length(splits)-1
		tmp = tmp[searchindex(tmp, splits[k])+length(splits[k]):end]
 		vals[k] = split(tmp, splits[k+1])[1]

 	return Dict(zip(kwds, vals))