Staged Compilation

I’ve been learning more about Racket’s staged compilation system, and I’m wondering if there is any plan in Julia to have more explicit support for the kind of principled stated metaprogramming that goes on in the Racket-verse.

Also, it seems like a lot of the Julia ecosystem is split between “fortran-style code” that ends up run in hotloops and has allocation discipline, and “lisp/python-style code” which behaves very dynamically. I think (though I’m not sure) that a lot of the flak Julia gets for precompile times comes from compiling all of that lisp/python style code. I.e., I’m not sure that the runtime performance of Plots.jl or Symbolics.jl would be that much slower if it was fully interpreted in a decent VM, and it would speed up the load times a lot. At least in AlgebraicJulia, there’s a lot of stuff that we do with small amounts of data that we don’t really need compiled, and then some stuff that we do that really does need to be compiled.

The point of this is not really to suggest a new approach to staged compilation; really I’m just wondering if someone can tell me or point me to resources so I can understand the design decisions that lead Julia to where it is today. I’m pretty committed at this point to doing a lot of metaprogramming/language development in Julia, and so I want to understand where I can best focus my efforts in improving my metaprogramming productivity by understanding the history of the current approach to metaprogramming in Julia.

1 Like

I’m not sure I fully understand your main goal here, because of a possible terminology confusion. Staged functions are what we call @generated functions, which are indeed a form of metaprogramming. However, if I understand your meaning correctly, you’re more interested in “compilation stages,” i.e., the whole parsing/lowering/type-inference/LLVM IR generation/native code pipeline as well as alternatives like interpreting Julia code. If that is indeed your main interest, for clarity I’d suggest calling this “compilation” rather than “metaprogramming” since the metaprogramming manual chapter is about something quite different from the compiler pipeline. But I could be misunderstanding your main interest, so please do correct me if I’m wrong.

You question is quite broad and there’s no one source I can point you to; the “devdocs” section of the manual, some blog posts, and various JuliaCon presentations all describe various elements of the pipeline. (Shameless self-promotion: https://www.youtube.com/watch?v=wXRMwJdEjX4 is a lengthy overview, but even it only scratches the surface.)

To one specific point:

I’ve been learning more about Racket’s staged compilation system, and I’m wondering if there is any plan in Julia to have more explicit support for the kind of principled stated metaprogramming that goes on in the Racket-verse.

I’m unfamiliar with Racket, but from this page I’d say we already have all of those things. You can see the code for all of these stages using Cthulhu.

Finally, let me close with a specific suggestion: if you’re interested in diving into this, a good starting point might be to contribute to development of JulliaInterpreter. It’s a great way to become familiar with Julia’s lowered representation, the entry-point to the compiler pipeline. It’s also impactful (it’s the foundation of the debuggers), gets less attention than it needs (the people who work on it also do lots and lots of other things across the Julia ecosystem), and there are several orders of magnitude in performance separating it from execution of compiled code (there’s headroom for huge improvement). Finally, it also has several open issues, and closing some of them would give you a list of concrete goals to shoot for.

11 Likes

Thanks Tim; these kinds of resources are exactly what I was looking for! I’ll take a look at JuliaInterpreter, and also Cthulhu (I’ve never heard of Cthulhu in this context before).

For more context, in Racket, they have a staged compilation system where they separate out module dependencies that are used at compile time, and module dependencies that are used at runtime, and this is iterated, so there can be arbitrarily many “layers” of compile vs. runtime that are lexically separated. In Julia, it seems like there’s pretty much just one level of computation; I end up having to either use eval or contort my values to fit into types when I want to pass values other than syntax trees into compile time.

This is a great video introduction to some of the interesting features of Racket’s macro system: [PADL'23] Modern Macros - YouTube.

3 Likes

If you’re talking about compiling (not metaprogamming) then yes, lowered optimization would be of help.

Julia is already much faster with 1.9 (about to be released), since it compiles fully to native code. Could still be faster as I show below (I’m only testing on rc1, not latest).

Plots.jl already lowers it to 1 (and I’m not sure a VM is needed only -O0 that really means no optimizations, as it did, it no longer does, now e.g. it does constprop, thou a win I understand):

This is something Julia could do by default (and then if needed opt into higher optimization levels at runtime, i.e. “tiered compilation” as with C#).

Note, this doesn’t apply to its dependencies (or I believe not inherited, which is a good thing, i.e.), in particular its backend(s). Those that are implemented in pure Julia would then get slow; not e.g. GR.jl since its a wrapper for C code though.

It’s possible the lowered-optimization of Plots.jl (and some other packages) needs to be reconsidered for 1.9+. I’m not sure, but could it be that the optimization is stored somewhere and then when you load on the default -O2, then it gets reoptimized, i.e. the native code isn’t used? I know that happens in the other direction, when I run tests, packages get recompiled to disallow bound-checking. Besides, even if this isn’t a big problem, then there’s no longer a real need to avoid optimization, since you no longer pay it at runtime. You could even go up to -O3.

The default backend to Plots.jl loads super fast (so does PlotlyLight):

julia> @time using GR
  0.281899 seconds (354.40 k allocations: 21.572 MiB, 10.41% compilation time)

Nothing much stands out, except:

julia> @time_imports using GR
     20.6 ms  Preferences
      0.5 ms  JLLWrappers
      5.4 ms  Bzip2_jll 87.53% compilation time
[..]
      7.3 ms  Fontconfig_jll 88.01% compilation time
[..]
      0.6 ms  GR_jll
    115.0 ms  GR 5.72% compilation time

A few more do for (and all really needed, all the time?):

julia> @time_imports using Plots
[..]
    118.9 ms  ColorTypes
    228.4 ms  Colors
[..]
      0.5 ms  LogExpFunctions → LogExpFunctionsChainRulesCoreExt
      0.2 ms  OpenLibm_jll
      0.3 ms  JLLWrappers
      4.8 ms  OpenSpecFun_jll 87.51% compilation time
     24.1 ms  SpecialFunctions
      0.4 ms  TensorCore
[..]
     11.0 ms  OrderedCollections
    107.3 ms  DataStructures
[..]
      6.3 ms  Fontconfig_jll 88.57% compilation time
[..]
      5.7 ms  Qt5Base_jll
      0.6 ms  GR_jll
    143.1 ms  GR 3.73% compilation time
   2220.1 ms  Plots

So in the end:

julia> @time using Plots
  3.901799 seconds (4.29 M allocations: 267.669 MiB, 9.30% gc time, 0.98% compilation time)

Note, that’s after the very slow initial:

julia> @time using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
106.986772 seconds (4.23 M allocations: 262.590 MiB, 0.27% gc time, 0.14% compilation time: 27% of which was recompilation)

Which is a one-time thing, so ok, and got worse in 1.9, which is justified, since pays off later.

Note from Julia --help:

 -O, --optimize={0,1,2*,3}  Set the optimization level (level 3 if `-O` is used without a level) ($)
 --min-optlevel={0*,1,2,3}  Set a lower bound on the optimization level

The latter applies to packages I believe, and I would suggest it do default to 2 or 3, and change -O to default to 1 or 0…

What I’m talking about is that there’s a lot of code that I write, where the end result is “organizing a numerical computation.” Sometimes this is explicit metaprogramming, sometimes this is normal programming. But very little of it is ever run in a hot loop; I care much more about development speed/speed of my feedback loop than actual program speed. My development speed would be faster in a language like racket or ocaml, where there is just compilation to bytecode.

I wrote a library GitHub - AlgebraicJulia/CompTime.jl: Library for compile-time computing in julia in order to explore these tradeoffs.

In any case, I’m not sure how productive continuing this conversation is; Tim gave me good pointers to solidify my thoughts around.

2 Likes

This sounds interesting. I’m actually fond of the fact that Julia has “arbitrarily many layers of compile vs. runtime”, by which I refer to the “either use eval or contort values to fit into types” practice that you don’t like.

I don’t have experience with Racket so I wonder how does the “lexical separation” that you mention look like? Does it offer some advantages?

You should watch the video that I linked; I thought it was pretty cool! [PADL'23] Modern Macros - YouTube

But more generally, the racket folk have thought a lot about making a language that is super extensible in a lot of different ways, and I think that in Julia a lot of the time we want to do similar things, but end up doing them in a janky/non-compositional way.

3 Likes

I feel part of the perceived misalignment in discussion here comes from the topic itself being two-pronged. The first is staged compilation à la Racket (or LMS in Scala, etc), which is very cool (thanks for linking the PADL talk, I wonder if there are any parallels to Julia’s abstract interpreter model). The second is about how to separate “fortran-style code” and “lisp/python-style code”, ideally lexically. Though it feels like both concerns are tied together quite nicely, I think it’s valuable to separate out the second one because Julia provides some ways to handle it.

So what does Julia provide? As mentioned above, it’s possible to have fast and slow components of the same codebase by using module-level compiler directives. Implications of --compile=min and --optimize=0, for dummies does a way better job of explaining these than I can, so I’ll just add that most of the options have now been centralized under Base.Experimental.@compiler_options. Is this as elegant/composable as first-class staged programming? I’d assume not, and the module-level granularity is certainly a limitation. Does it help with the stated end goal of not paying the cost of compilation for very dynamic, not hot code? I think it’s a start, and also ties in with the discussion on JuliaInterpreter because there’s been on-and-off talk about replacing the built-in interpreter with a pure Julia implementation.

2 Likes

Thanks for linking that article; very interesting! I didn’t know about module-level compiler directives; I think I will start using these!

1 Like