Announcing Air.jl: STM and immutable collections

Air.jl is a new package for Julia that implements a number of composable multi-threading utilities largely inspired by paradigms from Clojure. These include:

  • Immutable collections
    • Persistent Dictionaries
    • Persistent Sets
    • Persistent N-Dimensional Arrays
    • Persistent Weighted Dictionaries and Sets (which act as heaps, discrete distributions, and dictionaries/sets)
    • Lazy Dictionaries
  • A Software Transactional Memory system
    • An asynchronous Actor type, based on Clojure’s agents
    • A Volatile type that supports thread-safe mutations within transaction code blocks, based on Clojure’s refs
    • A thread-local variable type

Air is pre-v1.0, but has been reasonable tests and performance comparable to Clojure for update operations and to Julia’s native collections for lookup operations.

I’m eager to have more eyes on this project and look forward to all the future GitHub Issues!

Air.jl on GitHub

13 Likes

I’m glad people are working on persistent collections. Would you be able to compare Air.jl’s types to FunctionalCollections.jl’s types?

3 Likes

My understanding is that FunctionalCollections.jl is no longer actively maintained (though I am not certain of this). Regardless, here are a few differences in just the API for the collections:

  • Air.jl attempts to mimic the native Julia types more closely. In FunctionalCollections.jl, for example, you use assoc(dict, key, val) with the PersistentHashMap type while in Air.jl you must use either push(dict, k => v) or setindex(dict, v, k) with Air’s PDict type. The PDict functions imitate the native Dict's versions: push!(dict, k => v) and setindex!(dict, v, k).
  • Air's PArray{T,N} <: AbstractArray{T,N} type is an N-dimensional persistent array type whereas the FunctionalCollections type PersistentVector{T} <: AbstractArray{T,1} only supports vectors.
  • Air's PWSet and PWDict types can act as priority queues (first(pwset) yields the item in the set with the highest weight in O(1) time), similar to FunctionalCollectionsPersistentQueue, but Air's types also act as sets or dicts, and they can be updated and sampled (rand(pwset) yields a random element of the set with the probability of each element proportional to its weight).
  • Air's pop(coll) function yields (last, most) whereas FunctionalCollections uses last = peek(coll) and most = pop(coll). (The first and rest can be obtained via popfirst.)
  • Air supports O(1) push/pop and O(1) pushfirst/popfirst.
  • Air supports dictionaries whose values are calculated lazily and cached once requested (LazyDict), which has no equivalent in FunctionalCollections.
  • FunctionalCollections includes a PersistentList which is currently not supported by Air.

Additionally, there are a number of other minor UI differences, some of which just reflect how young Air is so far. And, of course, Air included multi-threading utilities like Actors, Volatiles, Vars, transactions, Delays, and Promises.

In terms of performance, I haven’t looked deeply into the FunctionalCollections core code, nor have I run any benchmarks on the FunctionalCollections types, but this is a good idea. I’ll try to add these when I add some careful benchmarks of Air to its documentation in the next couple weeks. My intuition after glancing at the FunctionalCollections bitmapped trie code is that Air may be faster because FunctionalCollections is using abstract types in some critical places that probably cause type instability, but until we have benchmarks, I would suggest some skepticism about this analysis.

(Apologies for the many typos—my current keyboard is giving out.)

5 Likes

Very cool thing!

Using task local storage for transaction log means f() and @sync @spawn f() are not interchangeable, which makes code composition rather tricky. Although arguably you shouldn’t synchronize inside tx. What Clojure does this in this case? Does it forbid synchronization inside transactions?

Also, how do you implement a multi-dimensional persistent array? Is it a wrapper around persistent vector(s)?

Easy answer first: The PArray type uses a bitmapped trie to store the elements by their linear order, so essentially yes, there is a sort of persistent vector representation under the hood.

The transactions of Air are a multi-threading tool that is mostly orthogonal to @sync. If you want to syncronize across many threads, you probably want to put the transaction blocks in each of the @spawned threads and not around the @sync. The point of a tx block is to have a single-threaded piece of code that is atomic for all the actors/volatiles inside it and thus can be run at the same time as other single-threaded tx blocks. Unlike functions that use mutexes and conditions to protect mutable data, the tx blocks are perfectly composable as long as they obey this single-threaded requirement, though. If you need to sync/spawn inside a tx block, then you must ensure that all the @spawned threads do not touch any Volatiles or Actors.

One of the quintessential examples for this kind of system is having a set of threads that process diverse commands arriving asynchronously from many clients (like a web server)—each client’s thread can read a command, then, inside of a tx block, process it without regard for whatever the other clients’ threads are updating. Again, if one command needs multiple threads, each thread would use its own tx block.

Yes, I think I know the intended use cases of STM. As I said, synchronization in transactions is arguably a bad idea. However, the usual pitch of STM is that it’s good for compositional API since nested transactions can be merged. This does not work well in general in Julia because using tasks or not is an implementation detail of a method and there’s no way to query this statically. But I don’t think it’s impossible. I sketched a design for ensuring method properties such as purity and yield-free: Suggestion: Fortran-like Keyword to Assert Function is Pure - #52 by tkf

For a bit of more context on my end, this is the reason why I’m interested in GitHub - JuliaConcurrent/Reagents.jl: Composable nonblocking and synchronization programming framework which is kind of a fusion of STM and usual nonblocking programming. It can compose both nonblocking and synchronizing programs.

2 Likes

I see; just to make sure I understand—you are saying that transactions aren’t generally composable, because a method that I call inside of my transaction might itself use multiple threads. However, this critique applies to literally every composition in Julia (see paragraph 3), so I don’t really think this is much of an issue—the real problem arises here from mixing and matching different concurrency paradigms and not knowing whether the functions you are calling obey your program’s concurrency paradigm, not from using other tasks. You can use @sync and other tasks in your transaction as long as the thread of the transaction is the only place where volatiles and actors are touched (i.e., as long as your function f and its synced threads obey Air’s STM paradigm).

Here’s a slightly better explanation of how I think about this. Every solution to the abstract concurrency problem requires some kind of restriction on the structure of the program that is written (i.e., if you give me procedures f and g that have no limits on them, then it’s impossible to ensure that they will work concurrently without imposing some kind of structure on them). The limitation enforced in Air (and Clojure, Scala, and others) is to require that mutable values be wrapped by Volatile objects and that transaction blocks local to the task enclose all atomic operations over those mutables. If you violate that assumption, then the concurrency will break.

Put another way: it’s also an implementation detail of a method whether it uses a mutex. But the use of a mutex in a method fundamentally makes it non-composable because composition can quickly and easily lead to deadlocks. In this sense, nothing in Julia is safely composable in the general sense unless you can put some limitations on what you are composing.

The reagents library looks very interesting! I’m not very familiar with this approach myself. It appears to me that in the case of reagents, you can allow a slightly greater level of composability; however, it also looks like you are giving up on certain other things—for example, it’s not clear to me how one could have a short block of code that read from a few global variables then changed a few mutable values (that are shared across threads) safely. If you did this in one of your reagents, you would break your program. But again the problem there would be that you mixed concurrency paradigms and not that reagents fundamentally don’t support composition. (That said, I have just perused the reagents readme; I may be misunderstanding somewhat.)