Why are compiler devs in such high demand for Julia?

From different automatic differentiation in Julia such as Zygote, Diffractor, and Enzyme, each handling it at a different level, to loopmodel, to getting precompilation improved, and so on. Moreover, Julia wants to define its memory model in order to support atomic operations, wants to move the GC to LLVM GC, etc, which may require even more compiler devs, at least their cooperation to ensure things get designed correctly. In other languages, I see the compiler as a static thing and libraries getting developed. Here, I see things generally used needing compiler devs to work. How did it turn out this way?

2 Likes

what’s this llvm gc you’re talking about?

other than that, I think most of the difference is that Julia is young and not backed by a big company. FAANGs have a bunch of people working on llvm/python/C#/Java compilers, I think a lot of the difference is just how visible that work is.

7 Likes

LLVM provides some garbage collector functionalities that languages can use. If I don’t get it wrong somewhere, Julia is trying to use LLVM’s functionality instead of implementing its own GC from scratch. LLVM provides support for describing GC requirements, not implementing it. This is what I could gather. I’m not an expert. In short, I believe Julia is trying to use this functionality for its GC.

I don’t believe this is correct. There is work going on to potentially move Julia’s GC to MMTK, but that has nothing to do with LLVM.

1 Like

Maybe I’m wrong or maybe we’re talking about different parts of GC. LLVM stated it only helps describe the GC requirements. Languages still implement the GC themselves using their own runtime system.

This doesn’t seem right to me. The languages you’re describing are just a lot more stable and there’s no expectation that most new features could be implemented as language changes, so something like AD is built as a library, not as a language change.

1 Like

Can you post where you saw this information about Julia ↔ “LLVM GC”? Assuming the latter refers to Garbage Collection with LLVM — LLVM 17.0.0git documentation, a quick search of GitHub and Discourse doesn’t appear to turn up anything relevant.

Back to the main topic, all of the things you mention for Julia have been done for other languages as well in recent memory. For precompilation see native compilation on the JVM and .NET. For source-to-source AD see Swift for TensorFlow, Tapenade (Fortran, C), Adept (C++) and the various efforts in Scala/Haskell/etc. Even Enzyme is presented first as working on C code. I’m not going to try to list loop optimizers since a new one seems to come out every week, but suffice it to say the vast majority of them were developed for other languages.

Perhaps what differentiates Julia is how close these projects are to core language development? Many of the non-Julia examples are/were more detached, and more than a few were research projects that never saw any actual production use. This perception and the overall centralization of the Julia community may also explain

Since AD in Julia was first built as libraries. Thus I wonder if the question is more one of priorities and maintenance (e.g. how much upkeep do non-source-to-source ADs get in Julia vs other languages) over the absolute number of compiler-related projects.

2 Likes

Julia has a JIT, multiple/dynamic dispatch, non-moving GC, and interactive code invalidation, while CPython, C++, etc. do not have all/some of these. Additionally, Julia is a dynamic language, and we cannot compile all code ahead of time. Thus those other languages have different codegen and optimization requirements (and these features do not come for free just by using LLVM or some other library). And because we have these very useful features, our ecosystem leans on them quite heavily to get excellent performance and features for things like AutoDiff. Thus, in order for Enzyme, Zygote, et. al. to work well, we don’t just need LLVM experts; we need experts in LLVM+Julia+GC+etc. This is a smaller group of people, although there is certainly some overlap with existing compiler development populations.

Also, relatedly - because Julia code has such close access to the compiler via reflection, Julia libraries often do things which are quite close to standard codegen/compiler optimization, which again requires some specific expertise, as opposed to an approach which does not use those features.

16 Likes

And the obvious follow-up: if we want Julia to advance faster as both a language and as an ecosystem of cooperating, reliable libraries, we need some way to encourage developers to gain the relevant experience to become compiler developers that know how to work on Julia and its libraries. Doing so involves a mix of things:

  • Better developer documentation for Julia’s internals and library internals
  • Better inline code documentation in Julia’s compiler
  • Examples and tutorials for leveraging Julia’s codegen and reflection capabilities to do useful things like AD, “from scratch”
  • Examples and tutorials for doing development at the level of Julia IR and LLVM IR, from Julia
  • High-level and mid-level documentation of how Julia’s key features (JIT, multiple dispatch, etc.) work in practice
21 Likes

Those would be nice to have, but who would do them?

1 Like

It gets worse though.
Compiler development doesn’t exist in a vacuum.
For example, you want a compiler dev who can do something with linear algebra stuffs efficiently, now you need compiler dev who KNOW linear algebra. Now, take a loop optimization that vectorizes loops, now you need developers who know cache optimization, SIMD, and parallelism. For (efficient) automatic differentiation, you may need to know all of these plus automatic differentiation technique. Julia was written as a greedy language and now it makes greedy demands for skilled devs in return.

Hah, yeah, I didn’t intend that as a call to action (there’s enough of those posts here). Anyway, I expect that developers who are just getting into learning these things (for one reason or another) are probably good possibilities, as they may not already be 120% overcommitted, but realistically, it’d probably have to be the current compiler devs, given how much material there is to cover.

My hope would be for the average developer to learn a bit here and there (it’s not “rocket science”, most compiler/runtime details are actually quite mundane), as there are so many users that are just on the cusp of grasping the necessary details to become compiler/runtime developers.

3 Likes

I share the same feeling with you.

PL and compiler devs are in high demand in every programming language community, due to the increasing complicated language features (thread/memory model/incremental compiler). Unfortunately, the group who works on PL and compiler is really small globally. Emergence of AI does bring many new bloods into this field, but they mainly focus on Python or their own DSLs. Even with the bright future of AI, compiler development progresses slowly due to the innate hardness of the problems they want to solve (for example, loop vectoriation and polyhedra compilation are added into LLVM only recently).

That’s why I hold a pessimistic view of Julia. The main target of Julia is those researchers in traditional scientic computation field. Few of these people can contribute to the language itself. And I don’t think it’s their responsibility to do so, for they should focus more on their research, instead of internal details of compiler. But the weakness of Julia is obvious for those “hardcore” CS programmers: it has limited compiler toolchains (no static compilation, no type checking, etc.), weird language features and lacks application scenario.

Unforunately, Julia is not C++/C. Your “average” developer is already a high standard for many people (both CS and non-CS people). C++/C programmers are forced to learn the cusp of internal compiler details in the school because they have no choice. And if they want to be a SDE, then it’s likely they will use Java/Go in their work instead of C++, so the requirement of these knowledge in depth is even lower. That’s said, not many users can become compiler/runtime developers, not in other static languages’ community neither.

2 Likes

Meh, disagree. Some C/C++ developers know how their language features compile to ASM, and usually not much more (if that). And they might know recounting, which is super trivial compared to GC semantics. Anyway, the average C/C++ developer blindly copies code from SO without understanding it, just like everyone else (including Julia users; nobody in either community is special just because they use a certain language).

Also, how many C/C++ devs contribute to clang/LLVM or gcc? Probably a far smaller ratio than the number of Julia users who end up contributing to Julia/Base/Stdlibs.

2 Likes

In the worst case, it comes with a runtime like Java does. Unlike many devs, I started out with Python and am still preferring the ease of use of high-level languages. Maybe I’m different from many, but I’m pretty sure there are many devs who prefer high-level languages due to them being economical.

I am not comparing the ratio of excellent programmers in Julia and in C++. If you read my comment carefully, what I want to emphasize is that, in any community, the ratio of such programmers is extremely low, even in the C/C++ community, whose programmers are commonly believed to be highly skilled. Still, I believe that ratio is higher in C/C++ community, due to different user groups.

Therefore, it’s impractical to expect “some many users” in current community can become compiler/runtime developers. The only way left is to expand the community and increase the absolute number of such programmers.

But they also have other compiler projects. Open64, Halide, TVM, icc and so on. Clang/LLVM is just one of such projects. If Base is counted for Juila, then stdlib/boost should be counted for C++…

1 Like

While I agree with the general sentiment, it’s fair to say Julia has a proportionately larger amount of “we’re waiting for this compiler thing to land” (whether in Core or outside of it) responses to inquires about improving the overall library ecosystem. Discussing the causes and implications of this would be an interesting digression, but it’s certainly a phenomenon I’ve noticed more in this community compared to other peer and most “mainstream” languages.

2 Likes

Honestly speaking, it’s a bit overconfident to compare Julia’s runtime to Java/Python’s runtime. The latter is the perl of compiler techniques.

You are right. But writing high-level languages are economical, maintaining them is another problem. In the 90s, dynamic languages could still compete with static languages. But nowadays, pure dynamic languages almost die out because program analysis is way more cheaper than before (computing power is abundant) and project size grows considerably. Programmers are expensive but computers are cheap.

For Julia this is more clear. Julia has no type checking. So people for a long time use code_typed to debug type instability. This works pretty fine when you prototyping a small codebase. You won’t be interrupted by type checker. But when you have a large codebase and you need to handle different input types, this quickly becomes a headache and ineconomical…

2 Likes