State of Julia grammar for tree-sitter

Hi!

I saw that Julia has an initial implementation of grammar for tree-sitter. However, the development seems stalled:

https://github.com/tree-sitter/tree-sitter-julia

I tried, but it seems to lack many things. It failed to recognize many patterns in my test.

Are those problems related to the lack of manpower to finish the grammar or some technical issue related to how tree-sitter works and Julia language syntax?

3 Likes

Probably just lack of interest—it sounds like a fun project, but it’s not clear whether anyone has a burning need for a tree-sitter Julia parser?

(Julia syntax also has some idiosyncrasies, so it’s possible that there are technical limitations of tree-sitter applied to Julia, but that’s more of interest to the tree-sitter developers. It seems unlikely given the range of languages they already parse, though. Our current parser is just recursive descent with limited backtracking IIRC, so it’s not that strange.)

2 Likes

Thanks @stevengj !

tree-sitter will supposedly provide a way faster syntax highlighting for Neovim. Currently, for some big files, it can be very slow.

If there is not any technical issue, I think I will give a try to see if I can help.

1 Like

hey @Ronis_BR !

I can’t directly write code to create such a tool but if you need early testers at any point, please let me know.
Happy to try it out as soon as possible - I have some issues with julia-vim so would love better syntax recognition. :slight_smile:

~ tcp :deciduous_tree:

1 Like

Yes, if/when editors—originally Atom, maybe someday vsCode (vsCode#50140)—and other tooling is built around tree-sitter (e.g. github may be able to use to show more information for pull requests?) then the motivation for Julia developers to work on tree-sitter support will increase.

2 Likes

Yes, the Julia parser is mostly a standard recursive descent parser. There’s not really any backtracking, but there’s plenty of places in which it post-processes the parsed output in an ad-hoc fashion. This is a bit like backtracking, but also somewhat like a lowering step built into the parser. It also makes certain parts of the existing parser hard to understand because parsing decisions might be second guessed in some other part of the code. There are places where a single token of input is put back into the token stream, but this is done locally so it’s mostly just an awkward way of peeking two tokens forward.

Most of the expression post-processing isn’t really necessary but comes about from reusing a single parsing function to parse subtly different parts of the grammar. I’ll hazard a guess that a formal Julia grammar would mostly be LL(1) and no worse than LL(3) in “most places” so generally it’s quite simple. But there’s a few problematic areas which might push the grammar into a more general class. Classifying parenthesized expressions as tuples vs blocks is particularly messy, especially considering anonymous function argument lists and the distinction between Expr(:kw, ...) vs normal assignment Expr(:(=), ...) for keyword arguments. There’s a few other oddities too.

tree-sitter uses the quite general GLR parsing algorithm which I’d guess could deal with these cases without a problem.

7 Likes