Extract variables from strings

I’d like write a macro to extract variables from strings. So writing

@string_extract     "test var_1 and var_2"      "test $x and $y"

would set

  x = "var_1"  
  y = "var_2"

So, is there a function to convert "test $x and $y" into a list or dictionary like:

[
  "test "    : string,
  x          : variable,
  " and "    : sting,
  y          : variable 
]

Possibly related: Parse structured string to dictionary - General Usage - JuliaLang

A simple

_, x, _, y = split("test var_1 and var_2")

sets x and y as desired.
It was not clear from your description if variable names x and y are hard-wired or to be parsed first.
But using $x in a string does interpolation and requires x to be known.

1 Like
julia> Meta.@dump "test $x and $y"
Expr
  head: Symbol string
  args: Array{Any}((4,))
    1: String "test "
    2: Symbol x
    3: String " and "
    4: Symbol y

The expression itself is already the solution that you want.

This could be one implementation, but it will only work if there are no two interpolations side by side:

macro string_extract(string::String, strexp::Expr)
    range = 0:0
    args = strexp.args
    strindices = map(args) do arg
        !(arg isa String) && return missing
        range = findnext(arg, string, range[end]+1)
    end

    ranges = map(1:length(strindices)) do i
        if ismissing(strindices[i])
            start = i == 1 ? 1 : strindices[i-1][end]+1
            stop = i == length(strindices) ? length(string) : strindices[i+1][1]-1
            start:stop
        else
            strindices[i]
        end
    end

    args .=> getindex.(string, ranges)
end

julia> @string_extract "test var_1 and var_2" "test $x and $y"
4-element Vector{Pair{A, String} where A}:
 "test " => "test "
      :x => "var_1"
 " and " => " and "
      :y => "var_2"
1 Like

Fantastic :slight_smile:
but how do you go from this to having variables x and y with values “var_1” and “var_2” ?

macro string_extract(string::String, strexp::Expr)
    range = 0:0
    args = strexp.args
    strindices = map(args) do arg
        !(arg isa String) && return missing
        range = findnext(arg, string, range[end]+1)
    end

    ranges = map(1:length(strindices)) do i
        if ismissing(strindices[i])
            start = i == 1 ? 1 : strindices[i-1][end]+1
            stop = i == length(strindices) ? length(string) : strindices[i+1][1]-1
            start:stop
        else
            strindices[i]
        end
    end

    pairs = args .=> getindex.(string, ranges)
    Expr(:block, [Expr(:(=), esc(sym), string)
        for (sym, string) in pairs if sym isa Symbol]...)
end
julia> @macroexpand @string_extract "test var_1 and var_2" "test $x and $y"
quote
    x = "var_1"
    y = "var_2"
end

The input to a macro is already a parsed expression — the point of using macros is generally to employ Julia-parsable syntax that is rewritten in some custom way.

It’s impossible to say for sure without knowing what you are ultimately trying to do, but as a general rule I would tend to re-think using string-based symbolic expressions.

5 Likes

Sorry Jules.
This is amazing but after running the macro I’d like to write "extracted variables are: $x and $y"
and get "extracted variables are: var_1 and var_2" but I get UndefVarError: x not defined

Final goal. Say I need to transform

Input = """"
Curve.My_Curve_1,Points=[(1,10),(2,11),(3,12)]
Curve.My_Curve_2,Points=[(5,8),(6,13),(7,15)]
Curve.My_Curve_3,Points=[(11,8),(12,13),(13,15)]
"""

to

Output = """
Curve My_Curve_1 x_values=[1,2,3] y_values=[10,11,12]
Curve My_Curve_2 x_values=[5,7,7] y_values=[8,13,15]
Curve My_Curve_3 x_values=[11,12,13] y_values=[8,13,15]
"""

Say we can use special characters ≪ ≫ to represent a newline separated list and ⪻ ⪼ a comma separated list. I’d like to write something like

@extract Input "≪Curve.$curve_name,Points=[⪻($x,$y)⪼]≫"

@insert "≪Curve $curve_name x_values=[⪻ $x ⪼] y_values=[⪻ $y ⪼]≫"

Maybe a function is better suited than macros.
If it is possible in some form though, something like this, I think, is more readable than multiple split and join statements, or regex.

It would be easier if the required inputs / outputs were in a readable format like JSON or XML but this isn’t always the case.

1 Like

Definitely use a function for this. No need for macros here.

You could also use ReadableRegex.jl although your structure needs more like a parser program…

using ReadableRegex

input = "Curve.My_Curve_1,Points=[(1,10),(2,11),(3,12)]"

r = "Curve." *
    capture(one_or_more(ANY), as = "curve") *
    ",Points=[" *
    capture(one_or_more(ANY), as = "points") *
    "]"

m = match(r, input)
c = m["curve"]
p = m["points"]
2 Likes

That’s really cool. Had no idea that existed

Yeah I wrote that for long complicated regexes that are impossible to read otherwise.

This is valid Julia syntax, so you can let Julia parse it. For example:

@mymacro begin
    Curve.My_Curve_1,Points=[(1,10),(2,11),(3,12)]
    Curve.My_Curve_2,Points=[(5,8),(6,13),(7,15)]
    Curve.My_Curve_3,Points=[(11,8),(12,13),(13,15)]
end

will pass the parsed expression (abstract syntax tree) your macro, which can then transform it in some way before it is executed.

What is your actual final goal? It seems like you are trying to define some kind of domain-specific language? For what purpose?

Moving data between systems that have different proprietary text formats. In my example, the input could be the output from System 1 and the output could be the required input for System 2.

Rather than write lots of split statements I wanted to declarativly express the the format of each system.