[ANN] SemanticAST.jl - semantic AST analysis for Julia source code


If you’ve tried to do analysis of existing Julia code (or have written a macro) then you’re probably familiar with how arcane Julia’s Expr grammar is. There isn’t a coherent source of truth for what forms you might get through and all of the weird and wonderful forms that the parser can produce are an endless source of surprises. In turn, analyzing Julia source code is hard, because not only do you need to implement the analysis that you’re interested in, you also have to figure out the basic structure of the Exprs you need to look for.

SemanticAST.jl is my attempt to improve this situation. It builds on JuliaSyntax.jl’s reimplementation of the Julia parser in Julia by implementing an semantic analysis that figures out what those Exprs actually mean. For example, SemanticAST has only three ways to define a function, not the several dozen that arise out of Expr itself. Unlike JuliaSyntax, SemanticAST does not attempt to preserve source order form: the goal of SemanticAST is to provide a semantically meaningful AST rather than one that matches the source code exactly.

The goal of SemanticAST.jl is to be able to analyze any Julia code that passes lowering - that is, gets to the point where Julia will try to execute it. It will also attempt to produce useful error messages that identify exactly what and where went wrong and has fallbacks so it can continue analyzing source code after a failure.

As an example, consider the following program:

function hello_world()
           println("hello world")

JuliaSyntax then produces

line:col│ tree                                   │ file_name
   1:1  │[function]
   1:10 │  [call]
   1:10 │    hello_world
   1:23 │  [block]
   2:1  │    [call]
   2:1  │      println
   2:9  │      [string]
   2:10 │        "hello world"

which SemanticAST then simplifies into

        [], [], [], nothing, 
            FunCall(Variable(:println), [
                PositionalArg(StringInterpolate(["hello world"]))], [])]))

Let’s spice it up a little and consider a somewhat more horrific case. Say,

function (::Type{U})(arg::String; arg1::T=3) where {U<:Int, T<:U}
        [arg for y = 1:arg1, w=1:arg1]    

which SemanticAST interprets as

        DeclName(nothing, CallCurly(Variable(:Type), [Variable(:U)])), 
        [FnArg(IdentifierAssignment(:arg), nothing, Variable(:T))], 
        [KwArg(:arg1, Variable(:T), Literal(3), false)], 
        [TyVar(:U, Variable(:Int), nothing), TyVar(:T, Variable(:U), nothing)], 
        CallCurly(Variable(:Vector), [Variable(:T)]), 
                Generator(Variable(:arg), [
                        IterEq(IdentifierAssignment(:y), FunCall(Variable(:(:)), [PositionalArg(Literal(1)), PositionalArg(Variable(:arg1))], [])), 
                        IterEq(IdentifierAssignment(:w), FunCall(Variable(:(:)), [PositionalArg(Literal(1)), PositionalArg(Variable(:arg1))], []))])]))])))

While not displayed here, SemanticAST retains the detailed source information provided by JuliaSyntax, allowing for byte-accurate error reporting.

SemanticAST is still in early form yet; the test corpus is small and macro support in particular is weak. It is still academic software and is suitably buggy as a result. Additionally, the output AST representation is subject to change. The library is part of the implementation of the static type system for Julia that I developed as part of my thesis and I hope that it can make the lives of other Julia source-level analysis writers just that little bit easier.