@kpamnany’s talk at the CAJUN meetup from last year addresses some of your questions:
In particular, by allowing the PARTR depth-first thread scheduler to map tasks to kernel threads and do work-stealing, composabilty is addressed: with this design, you can write threaded libraries and then write concurrent code that calls those libraries and it composes. Moreover, since the scheduling is depth-first, the innermost threaded library call gets the parallelism first, which is a unique property of PARTR and what makes it cutting edge research. This means that you get the parallelism where you want it—at the lowest level library where typically, someone has spent a lot of time, energy and expertise making sure that threads are used effectively.
The rest of your questions are about locking, which presumes that locking is the right way to do synchronization of threads. Julia’s new parallel runtime once it’s fully hooked up will be very similar to Go’s. Since Go has been very successful for writing highly concurrent code, we should take as many lessons from it as we can. The biggest lesson is nicely explained in this blog post:
The key quote from this post is:
Do not communicate by sharing memory; instead, share memory by communicating.
In other words, instead of sharing data structures and protecting them with locks, let a single task manage the data structure and use channels to communicate to that task; protect access to anything that needs synchronization that way. Note the complete absence of locks or slow, complex concurrent data structures in the Go example code at the end. Instead of locking and using concurrent data structures, idiomatic Go programs use normal data structures without locks but only a single task “owns” a data structure at any time and is allowed to modify it. (Unlike Rust, this is not enforced by the compiler, but it just a matter of program design and convention.) This can take the form of sending requests to update a structure to a single task whose job is to handle such requests or it can take the form of passing the references to the data structure around and having whatever task holds the reference to it make the updates before passing the data structure on to another task.
In this approach to multithreading, locks are far less important than they are in the traditional C++/Java style where many threads are trying to safely access a shared data structure at the same time. You may need them occasionally, but I hope that we can avoid needing to have “threaded” and “non-threaded” versions of every data structure imaginable. The flip side is that the task and channel implementations have to be really good and fast to make the Go approach practical. But we want that anyway.