An elegant and efficient way to extract something from ASTs

package

#1

MLStyle.jl provides all kinds of pattern matching to do matching on regular values, and ASTs are no exception.

Proposed by some developers, a new version which will be released recently has made progress in the make AST manipulations easier.

I prefer to use MacroTools.jl to make comparisons, for it’s quite specific in AST manipulations and the comparisons could indicate how elegant and fast is MLStyle.jl.

(Note that I don’t mean to diss the authors of MacroTools.jl, I do respect you all for your significant contributions to Julia.)

Check this:

At the very beginning, @Roger-luo has requested a feature about perform pattern matching on ASTs. At that time, I don’t really understand how important processing ASTs is.

However, when the desire of refactoring this package becoming more and more urgent, I found it quite difficult to rebuild a system to rewrite ASTs and extract what I want from them.

I then take some days to think about that, and image if I can perform pattern matching on ASTs in a most intuitive way:

ex = quote
function f(a, b, c)
    a + b + c
end
end
@match ex begin
  :(function funcname(arg1, arg2, args...)
       $expr 
   end) => (funcname, arg1, arg2, args, expr)
end

To achieve an intuitive pattern matching like above snippet, I refactored MLStyle.jl, and finally made what I want.

ex = quote
function f(a, b, c)
   a + b + c
end
end
@match ex begin
    :(function $funcname($arg1, $arg2, $(args...))
       :(::LineNumberNode)  # the first one of function body is a LineNumberNode
       expr
    end) => (funcname, arg1, arg2, args, expr) 
end

The form is more rigorous than the imagine of very beginning, for instance, people might want to match LineNumberNodes, so I shouldn’t remove LineNumberNodes from the value to be match. While the LineNumberNodes from ast pattern are removed, for it’s just a pattern, if you want to match some specific LineNumberNode, you can use reference pattern to do so:

my_line_number_node = ...
@match expr begin
      :(begin 
        $(&my_line_number_node)
        ...
       end) => ...
end 

Since implementing this kind of pattern matching on ASTs, related manipulations have become really easy. I found it behaves like some parser combinators, but not on token streams, it’s some parser combinators on Julia ASTs, which makes me feel quite excited.

Another point I want to present here is, MLStyle.jl is fast. It’s fast for it’s static, which makes it 3+ times faster than MacroTools.jl in the case presented at https://github.com/MikeInnes/MacroTools.jl.

ex = quote
  struct Foo
    x::Int
    y
  end
end

MacroTools.jl hides/ignores some information of ASTs, so it seems more concise. No LineNumberNode, and shapes of ASTs are not persisted(an Expr like :(a :: Int) will be transformed to a tuple (:a, Int)).

@capture(ex, struct T_ fields__ end)

However, MLStyle.jl works with more general cases, it performs matching exactly equals to the actual ASTs.
In MLStyle.jl, we can match the statements inside a struct definition in this way:

@match ex begin 
    :(
      (::LineNumberNode);
      struct $typename
          $(stmts...)
      end 
     ) => ...
end 

where stmts is an array of ASTs, more accurately, it’s the args field of a block Expr.

macro show(a) dump(a) end

@show struct Foo
       x :: Int
       y
       end


Expr
  head: Symbol struct
  args: Array{Any}((3,))
    1: Bool false
    2: Symbol Foo
    3: Expr
      head: Symbol block
      args: Array{Any}((4,))   <- This is exactly the `stmts`
        1: LineNumberNode
          line: Int64 2
          file: Symbol REPL[3]
        2: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol x
            2: Symbol Int
        3: LineNumberNode
          line: Int64 3
          file: Symbol REPL[3]
        4: Symbol y

And when you want to extract some from a sequence, you can use pattern Push and Many.

@match ex begin 
    PushTo(arr)  &&  # setup an array to push into
   :(
      (::LineNumberNode);
      struct $typename
          $(Many(a && Push(arr, a))...)
      end 
     ) => ...
end 

In above codes, the result arr will be same as the previous stmts.

But we might want to push an AST conditionally:

@match ex begin 
    PushTo(arr)  &&  # setup an array to push into
   :(
      (::LineNumberNode);
      struct $typename
           $(Many(
                 a && Case1 && Push(arr, a) ||
                 Case2
            )...)
      end 
     ) => ...
end 

Above snippet only push the AST node into arr when Case1 is satisfied.

Through above infrastructures, matching ASTs becomes a task like parsing something. You might want to take it as a new species of regex matching.