An elegant and efficient way to extract something from ASTs

MLStyle.jl provides all kinds of pattern matching to do matching on regular values, and ASTs are not the exception.

Proposed by some developers, a new version which will be released recently has made progress in simplifying AST manipulations.

I prefer to use MacroTools.jl to make comparisons, for it’s quite specific in AST manipulations and the comparisons could indicate how elegant and fast MLStyle.jl is.

(Note that I don’t mean to diss the authors of MacroTools.jl, I do respect you all for your significant contributions to Julia.)

Check this:

At the very beginning, @Roger-luo has requested a feature about perform pattern matching on ASTs. At that time, I don’t really understand how important processing ASTs is.

However, when the desire of refactoring this package becoming more and more urgent, I found it quite difficult to rebuild a system to rewrite or extract what I want from ASTs .

Then I took some days to think about that, and imagine how cool it would be if I can perform pattern matching on ASTs in a most intuitive way:

ex = quote
    function f(a, b, c)
        a + b + c
    end
end
@match ex begin
  quote 
     function funcname(arg1, arg2, args...)
       $expr 
   end => (funcname, arg1, arg2, args, expr)
end

To achieve an intuitive pattern matching like above, I refactored MLStyle.jl, and finally made what I want.

ex = quote
    function f(a, b, c)
       a + b + c
    end
end
@match ex begin
    quote function $funcname($arg1, $arg2, $(args...))
       :(::LineNumberNode)  # the first one of function body is a LineNumberNode
       $expr
    end
   end  => (funcname, arg1, arg2, args, expr) 
end

The form is more rigorous than the imagine version in the very beginning, for instance, people might want to match LineNumberNodes, so I shouldn’t remove LineNumberNodes from the original value. While the LineNumberNodes from ast pattern should be removed, for it’s just a pattern, and if you want to match some specific LineNumberNode, you can use reference pattern to do so:

my_line_number_node = ...
@match expr begin
      :(begin 
        $(&my_line_number_node)
        ...
       end) => ...
end 

Since this kind of pattern matching on ASTs is available, related manipulations have become unbelievable easy. I found it behaves like some parser combinators, but not on token streams, it’s some parser combinators on Julia ASTs, which makes me feel quite excited. In fact, it’s about the Homoiconicity.

Another point I want to present here is, MLStyle.jl is fast. It’s fast for it’s static, which makes it 3+ times( 5+ times in master branch) faster than MacroTools.jl according to the benchmark on the case presented at GitHub - FluxML/MacroTools.jl: MacroTools provides a library of tools for working with Julia code and expressions..

ex = quote
  struct Foo
    x::Int
    y
  end
end

MacroTools.jl hides/ignores some information of ASTs, so it seems more concise. No LineNumberNode, and shapes of ASTs are not persisted(an Expr like :(a :: Int) will be transformed to a tuple (:a, Int)).

@capture(ex, struct T_ fields__ end)

However, MLStyle.jl works with more general cases, it performs matching exactly equals to the actual ASTs.
In MLStyle.jl, we can match the statements inside a struct definition in this way:

@match ex begin 
    :(
      $(::LineNumberNode);
      struct $typename
          $(stmts...)
      end 
     ) => ...
end 

where stmts is an array of ASTs, more accurately, it’s the args field of a block Expr.

macro show(a) dump(a) end

@show struct Foo
       x :: Int
       y
       end


Expr
  head: Symbol struct
  args: Array{Any}((3,))
    1: Bool false
    2: Symbol Foo
    3: Expr
      head: Symbol block
      args: Array{Any}((4,))   <- This is exactly the `stmts`
        1: LineNumberNode
          line: Int64 2
          file: Symbol REPL[3]
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol x
            2: Symbol Int
        3: LineNumberNode
          line: Int64 3
          file: Symbol REPL[3]
        4: Symbol y

And when you want to extract some from a sequence, you can use pattern Push and Many.

@match ex begin 
    Do(arr = [])  &&  # setup an array to push into
   :(
      $(::LineNumberNode);
      struct $typename
          $(Many(a && Do(push!(arr, a)))...)
      end 
     ) => ...
end 

In above codes, the result arr will be same as the previous stmts.

But we might want to push an AST conditionally:

@match ex begin 
    PushTo(arr)  &&  # setup an array to push into
   :(
      $(::LineNumberNode);
      struct $typename
           $(Many(
                 a && Case1 && Do(push!(arr, a)) ||
                 Case2
            )...)
      end 
     ) => ...
end 

Above snippet only push the AST node into arr when Case1 is satisfied.

Through the infrastructures provided by MLStyle.jl, matching ASTs has becomes a task like parsing. You might want to take it as a new species of regex matching.

The url of repo is
GitHub - thautwarm/MLStyle.jl: Julia functional programming infrastructures and metaprogramming facilities.

Some examples could be found here:

12 Likes

I’ve been looking around for a more functional way of manipulating ASTs, and this looks like it good be a great help. Thank you!

2 Likes

You’re welcome!