Compiler work priorities

It may be part of the compiler-time latency “caching more things” and “PackageCompiler,” but I would agree that “getting Julia to the point where it generates great .so files that can be used by other languages” would likely have a positive impact on Julia adoption.

42 Likes

In addition to what Stefan said, the way to accomplish more compiler projects is:

  1. If you know of funding opportunities, please let me know and it will help us at Julia Computing grow our compiler team and execute more compiler projects.
  2. The Julia Lab at MIT is also another place where this could be a research topic for the right person.
  3. Influence tech companies to contribute engineering manpower, if they can’t contribute money. Many of the large tech firms do large open source contributions. The more vocal Julia users are, the more likely we will get contributions.
  4. Discuss with people at National Labs. The more they adopt Julia, the more we will get contributions and capabilities from that community.

Personally, I work on all the options above. I find that there are Julia users in every university, company, national lab, government, etc. but we need to be more vocal about asking for help - money or time.

-viral

35 Likes

Thanks for laying it out. One thing I would like to ask though is, how can I help? I don’t have the time to really be a core contributor there (my efforts are probably best kept concentrated in DiffEq), but there’s gotta be something I can do? Fixing compile times is quite important to me so I would like to offer myself as an extra set of hands who knows packages which take a long time to compile. I opened Understanding the compile times of DifferentialEquations.jl and Attempting to Help to get feedback but haven’t gotten any. Hopefully there’s a way to decrease the burden on the few compiler workers in order to better distribute the work and get it done!

16 Likes

Not sure whether this is a valid way of accomplishing more compiler projects, but I find the barrier to entry for the compiler somewhat higher than for Base.

This is because the data structures and source for Base are pretty apparent, well-documented and you can always play around in the REPL. In contrast, the compiler is harder to hack for relative newcomers.

Are there any guides for going into that? Guides for setting up a separate compile chain (at least the julia parts) for hacking, e.g. using Revise, ExtraCompiler such that I can make modifications that don’t break my session (beginning with insertion of prints until the debugger improves)? Descriptions of internal data-structures?

Is there a sensible way of helping with that (given the constraint that I currently don’t get how everything fits together)?

Re caching more things: If I finally passed the barrier-to-entry for compiler hacking, something I’d like to try is to create a persistent (cross session) cache. Idea would be that each compiled entity gets a collision-resistant hash of its inferred code, and we have a Merkle tree DAG where hash(A) incorporates hash(B) if A depends on B, i.e. if there is a backedge B->A. “compiled entity” means component of the dependency graph (need to collapse directed cycles). That way, much of llvm’s work and possibly some previous optimizer passes could theoretically be cached (last time I brought this up I was informed that linking is not ready for caching native code; and I still don’t get the world age system).

15 Likes

Thanks for this writeup. There is something I feel could belong to this list: more detailed documentation on what the compiler is expected to optimize, and better tools for inspecting cases it didn’t.

I often run into cases where I wonder if a particular optimization or type inference didn’t happen because I didn’t code it the right way, or if the compiler is not handling that optimally (yet). The best strategy I know at the moment is asking here, and reading Base code for getting ideas, but the compiler is often a black box to me. @code_warntype is useful in finding out what happened, but sometimes it is more difficult to say why it happened and what I can do about it.

11 Likes

I agree wholeheartedly, and I think the main blocker is the way multiple dispatch is resolved on some base module (usually Base) instead of being resolved by the caller.

the general outline , along with some overly emotional correspondences (mainly on my side) can be found in
this thread:

I also wrote a small POC using macro that generate a generated function … the added indirection gave me enough flexibility to prove that this is possible.
If there is interest I will “patch” the POC to work on julia version 1.0

1 Like

I can’t agree more. But another obstacle for this goal is that Julia runtime takes over all signal handlings (though I guess this is not what compiler does so I guess this is a bit off topic). It especially is problematic for SIGINT handler since you can’t respond to ctrl-c in your program as soon as you initialize Julia runtime. It would be nice to have an API to initialize Julia runtime without taking over all signal handlings, or at least not SIGINT (something like CPython’s Py_InitializeEx(0)).

1 Like

2 posts were split to a new topic: Making it easier to contribute to Julia

I hear what you’re saying about multithreading, but threads have a bit of a bad reputation as tools of parallel computation (which I say from the position of someone who has used threads quite a bit, and successfully, even if I say so myself).

What I wonder about is the support of the other implementation mechanisms for parallel computing (remote channels and remote references). How solid is the code at the moment, and are any changes/improvements planned?

I suppose any work in this area would perhaps not fall under the “compiler” heading, so may be this is not the place to ask?

3 Likes

I’d like also ask if the topic PackageCompiler will focus also on reducing the number of required dependencies - not linking libraries that are never called in the code, and full AOT with disabling JIT-runtime? This would support the use-case where a scientist develops the “bussiness-logic” dynamic library that will be deployed as part of a big C# / C++ library, without 500 MB dependencies (why ship FFTW when never used) and without unused stdlibs.

3 Likes

update Cxx.jl is very important, there are many repositories that rely on it.

Just noticed that there seems to be a new commit from Keno himself. Very exciting!

It would be really nice if more people get involved so that we are not relying on one very busy guy for absolutely everything.

17 Likes

Agreed. @zsz00 and others who mentioned Cxx, checking out the branch and doing obvious fixes for test failures might earn brownie points for showing you care enough to help out. While Cxx surely has a lot of difficult-to-update components, many packages require a certain amount of annoying but fairly routine work that’s pretty easy for most people to help out with. If someone can take that burden off Keno, it might clear more time for him to focus on the more difficult pieces.

11 Likes

Some developments on Cxx.jl are being reported in the comments of https://github.com/Keno/Cxx.jl/issues/390

2 Likes

+1 for this, but

:laughing: :laughing: :laughing:

3 Likes

I’ll cross-post this here for more broad attention:

@sdanisch has posted about how to compile Julia for wasm—if you’re looking to get involved or help with compiler work in a high-impact way that is fairly accessible, this is your chance! Also, think about how fun it will be when you get Julia running in a browser :grin:

20 Likes

I had another one of those really painful “I’m trying to impress someone how cool julia is”, and then it ended (again, as so many times) in a long discussion that, well, the performance that one sees initially is really not representative, JIT times, you know, really it is fast, just don’t believe what you are seeing right now… I’ve been through that experience sooo many times now, and in my mind it really is a major, major barrier to get wider julia adoption, I cannot even count how many folks clearly lost all interested in those first five minutes of exposure…

In any case, I think from my point of view, while I cannot wait to see all the PARTR stuff, the compile-time latency issues that are listed as priority 3 in the original post, would go way up on top of my list. I have at least a handful of projects that would really, really benefit from better multi-threading, but essentially in pretty much everything I do, the compile latency issues are way, way, way more problematic. In particular because they make it so hard to effectively advocate for julia, and many of my projects (not the stuff that the community here knows about around Queryverse.jl, but my actual scientific work) really need buy-in from other, currently non-julia users.

So, my hope would be that things like this and this might make it onto a julia 1.2 milestone :slight_smile:

Now, I can of course follow the rational for the current ranking that @StefanKarpinski wrote down, but at least from my situation (as a user) that is not the right trade-off. I completely understand that I might be a minority, and that other user needs are different (and more important), but I wanted to make sure this point of view also gets voiced.

45 Likes

I know open-source development isn’t a democracy and definitely folks should work on what’s important to them, but it would make me very happy to see improvements in the time-to-first-X experience. For me it’s not really about evangelism, but pure selfishness. Despite using Juno (and sometimes Revise) I still find myself restarting Julia pretty often (switching environments, or changing types or const values). Whenever I use python I’m always reminded of how fast it feels, because most things happen instantly. I think over the lifetime of my using Julia I’ve probably spent more time waiting for things to JIT than I’ve saved in faster execution for computationally-intense work.

26 Likes

Whether you’re in the minority or not, I don’t know, but I have the same preference. Lower latency/faster compilation is really the only feature I wish for at the moment.

10 Likes

Just a quick counter point to the way this thread is heading : I couldn’t care less about compile time. My work consists of very demanding long running simulations. Moving from C++ I’d be happy to wait an hour for compilation if it gave 1% better performance. And multithreading is essential.

My personal priorities would be 1,2,4,3

4 Likes