My friend and I are final year undergraduate students. We have an interest in compilers/interpreters and we’ve done some work related to Clang/LLVM in the past.
For our final year undergraduate project, we have 6 months to work on a project of our choosing. Since I am a bit new to Julia, I’d appreciate it if I could get some feature requests of suitable difficulty that we can take a stab at. Ideally the project should add something to Julia/ or its tooling. It’s not compulsory that it be officially integrated into the mainline, so we’re open to trying out experimental things.
We just need a starting point. We’ll be doing all the required research. Thank you for your time!
If you want a cool, but pretty hard project, a Julia parser written in Julia would be really cool, and would make it easier for us to improve error messages, speed of parsing, and fix a few longstanding issues in Julia.
That’s only part of the battle, right? You will have to make it able to parse Julia code. Also, you have to make it work without special bootstrapping.
CSTParser.jl is quite well tested at this point, so I think the main challenge in replacing the current parser would be solving bootstrapping and providing the right entrypoints for the rest of Julia’s C ABI. I believe @c42f already started a lot of this work in https://github.com/JuliaLang/julia/pull/35243 and https://github.com/JuliaLang/julia/pull/35844, but it will certainly still be fairly nontrivial to implement. This is also not enough to completely replace all femtolisp code in Julia, since lowering is also implemented in Lisp.
Yes, I rearranged the runtime a bit so that it should be quite easy to plug in an alternative parser at runtime.
Actually with that background infrastructure done, there’s some relatively low-hanging fruit available here which would be an excellent starting point. For example, @Ashutosh_Pandey if you wanted to pick up https://github.com/JuliaLang/FancyDiagnostics.jl/pull/4 and finish it off it could be pretty useful.
Note that you don’t need to solve the bootstrapping problem for this to be useful, because the parser can be replaced at runtime instead.
Replacing all the flisp code in the frontend is indeed a big job. There’s roughly 2x more code in the lowering passes than the parser, and some of it is fairly tricky.
I assumed is was only in the parser (and that’s it not that speed-critical). Is it only additionally in the “lowering passes” and do you have an idea if performance could improve a lot by replacing?
And on a related note, Python 3.9 changed to a PEG parser (from LL(1) parser):
Hi all, thank you for all the responses! I am the other guy working with @Ashutosh_Pandey on this project.
From what we understand, CSTParser.jl is already in very good shape, and the bootstrapping problem seems to be fairly complex for a first try. We do like @c42f’s comment about FancyDiagnostics.jl, it will probably be a great starting point to familiarize ourselves with the inner workings of the parser.
Do you have any resources which would help us get started with FancyDiagnostics.jl?
There’s not a lot of resources or documentation, to be honest. But do feel free to ask specific questions here or elsewhere.
CSTParser is in very good shape performance- and functionality-wise, and it’s quite well battle tested due to being part of the VSCode extension, among other things. CSTParser is also a lot faster than the flisp parser — 20x or so when I measured it. Another great thing about CSTParser is that it produces precise source code location info, due to its use of a CST rather than an AST.
As of the start of this year, I thought CSTParser could improve in several areas:
Documentation - there’s almost none right now and I found the structure of the code a bit of a mystery.
Error states - CSTParser knows fairly precisely where an error occurred, but the reason for the error can be hard to extract.
@zacln or @davidanthoff might be able to provide guidance or explain where I’m mistaken about the state of things
I think the advantages of rewriting the compiler frontend (parsing+lowering) would be to:
Preserve more source information to provide precise compiler diagnostics. For a start, syntax errors could be more precise — this is the low-hanging fruit.
Expose some of the source code transformations and analyses from lowering as APIs. This would enable certain types of advanced metaprogramming which currently require macros to re-implement parts of lowering.
Make the compiler implementation more accessible to the julia community.
Possibly performance in some circumstances. I don’t think the compiler frontend is a huge bottleneck compared to inference and code generation in the default configuration. However in certain circumstances (eg, low optimization levels) the difference could be substantial. Largely this is just my gut feeling — should be measured.
Something that I think would make a fun project would be to write a Lisp-to-Julia string macro. So, that, for example lisp"""(display "Hello, world!")""" would expand to print("Hello, world"), etc.
Not sure exactly how useful this would be. Julia already has most of the nice features of Lisp, but I think some people would enjoy embedding snippets of Lisp into their Julia code just for the fun of it.
Also, if you’d make it femtolisp compatible, then the julia parser would technically be written in Julia once you wrap it in that string macro.
Cool! It seems the Julia ecosystem is getting to the point where, if you can think of a cool idea for a package, then somebody has probably written it already!