Luvvy - Use Actors liberally for robust, highly parallel Julia code (prototype)

tldr; Richard Palethorpe / Luvvy · GitLab (It is not registered as a package yet. Probably only works with Julia 1.3+)

Following on from the discussion I had with @c42f in this PR for @async, I was inspired to try using the actor model with Julia. The actor model is something that really fascinates me, but I have found the various actor model libraries I have used quite clunky. They don’t seem to really encourage liberal use of actors and message passing, which is very much unlike the formal actor model as described by Gul Agha in Actors. Perhaps a big exception to this is Erlang and Elixer.

Due to the flexibility of Julia and especially multi-methods, I think it should be possible to create an actor library which allows you to express almost everything in terms of actors and message passing in a reasonably practical way. This means that structuring your code in terms of actors and messages is on a similar level of verbosity to structuring your code in terms of plain structs and methods in a synchronous style.

This may have some serious performance consequences, both negative and positive, but it is most interesting from a functionality perspective IMO. Using actors means you automatically get parallised, asynchronous and reactive code. It also gives you natural error boundaries and isolation. Extra thought must go into creating actor based code, but if you want to use multiple cores or machines, then this is the case anyway.

There are a huge number of challenges in making this work correctly and with reasonable efficiency, but for now I am just focused on making a nice API. Below is a basic hello world program from the README (which contains a few more details).

using luvvy

"Our Actor"
struct Julia end

"Our Message"
struct HelloWorld! end

# Set handler for all actors for the message HelloWorld!
luvvy.hear(s::Scene{A}, ::HelloWorld!) where A =
    # We shouldn't really use println
	println("Hello, World! I am $(A)!")

# Set the Genesis! message handler for the builtin Stage actor
function luvvy.hear(s::Scene{Stage}, ::Genesis!)
	# Juila enters the stage
	julia = enter!(s, Julia())

	# Send the HelloWorld! message to Julia (The Stage talks to some actors)
	say(s, julia, HelloWorld!())

	# The stage leaves, but don't worry, Julia can say her line before
	# gravity takes effect.
	leave!(s)
end

# Create the stage and send it Genesis! (this blocks until the stage leaves)
play!(Stage())

We could drop the message struct and just use Val(Symbol)

luvvy.hear(s::Scene{A}, ::Val(:Hello_Word!)) where A = ...

Here is a slightly more complex scenario taken from the tests.

# Popularity begets popularity
#
# Script:
#   We create the stage and this triggers Genesis!
#   In the handler for the Genesis! message we create two actors
#   One actor is created by sending the Enter! message (Nigel)
#   The other actor is created inline (Brian)
#   When Nigel's Enter! message is processed Entered! is sent
#   In the Entered! handler we ask all the other actors who loves who
#   Each actor recieves WhoLoves! messages asking if they love another actor
#   They spawn a Stooge (with delegate()) to query the other's popularity
#   (if they didn't it could result in deadlock)
#   If the other actor is more or equally popular, they give them love
#   Brian is more popular than Nigel so she gets some love and Nigel doesn't
#   After Brian increases his popularity, he tells the whole Stage to leave
#   When the Stage recieves the Leave message, it tests Brians popularity
#   The library then tells all the actors to leave.
#
@testset "luvvies sim" begin
    struct Actor
        name::String
        pop::Int
    end

    struct WhoLoves!
        re::Id
    end

    struct HowPopularAreYou!
        re::Id
    end

    luvvy.hear(s::Scene{Actor}, msg::HowPopularAreYou!) =
        say(s, msg.re, my(s).pop)

    luvvy.hear(s::Scene{Actor}, msg::WhoLoves!) = if me(s) != msg.re
        delegate(s, my(s).pop, msg.re) do s, my_pop, re
            other_pop = ask(s, re, HowPopularAreYou!(me(s)), Int)

            my_pop <= other_pop && say(s, re, Val(:i_love_you!))
        end
    end

    luvvy.hear!(s::Scene{Actor}, ::Val{:i_love_you!}) = let state = my(s)
        my!(s, Actor(state.name, state.pop + 1))

        say(s, stage(s), Leave!())
    end

    function luvvy.hear(s::Scene{Stage}, msg::Entered!)
        roar(s, WhoLoves!(msg.who))     # Nigel
        roar(s, WhoLoves!(my(s).props)) # Brian
    end

    luvvy.hear(s::Scene{Stage}, ::Genesis!) = let st = stage(s)
        say(s, st, Enter!(Actor("Nigel", 0), st))
        my(s).props = enter!(s, Actor("Brian", 1))
    end

    function luvvy.hear(s::Scene{Stage}, msg::Leave!)
        @test ask(s, my(s).props, HowPopularAreYou!(me(s)), Int) == 2

        leave!(s)
    end

    play!(Stage())
end

From my POV the most interesting things here are the ask and delegate calls. The library does not explicitly define a response to any message. A response to a message is just another message and it may not even come from the actor who the original query was sent to. So when using the ask method, which expects a response, we have to specify which type of message we expect the response to be. All other messages will then be ignored until this type of message is received (just checking the type probably won’t be enough eventually).

If an actor is ignoring messages waiting for one in particular, then this creates a potential deadlock scenario if two actors are blocking, waiting for each other to respond. This happens in the above scenario if Brian and Nigel both ask each other how popular they are at the same time. Both will ignore each other while waiting for an integer response.

This is where the delegate method comes in. This creates a new actor, called a Stooge, who handles this for them so they can continue to process messages. The API here is maybe a little ugly because we want the user to realise they shouldn’t capture variables within the function closure, but pass them instead so that they can be copied if necessary (they currently are not). Of course if it is possible to inspect and edit Julia function closures, then this can be made to look better.

I should point out that there is already an actor library called Actors.jl by @oschulz however I wanted to try something quite different. Although perhaps some underlying code and know-how could be shared.

For more discussion please see the README. Also the source is currently only ~250 lines without comments and is hopefully quite readable.

10 Likes

I’m glad too see this - I think Julia is a great platform for Actors!

Ages ago, I wrote the small (quite limited) Actors.jl. It was registered for Julia v0.x, but it was more of an early test. It then lay dormant for a long time, waiting for the day when Julia would support multithreaded tasks. Now we’re getting just that in v1.3 (and with bells on) - and I had originally planned to reactivate Actors.jl at that point. But unfortunately, I won’t manage to find time for this for the foreseeable future.

So thanks for luvvy, @richiejp!

1 Like

May I ask how the Actor model copes with failures? Perhaps I should actually read the documentation…
In the Exascale era the current model is clusters of node connected with low latency networks. If one node fails the whole computation fails - hence the current work on fast burst buffer storage for checkpointing. I know there is work on fault tolerant MPI to be honest.
Talking loosely, can we cope with an Actor failing?

@oschulz thanks for the encouragement!

@johnh This is probably one of the most interesting aspects of the actor model, even though the core model doesn’t enforce any particular error handling. Indeed I do talk a bit about this in the README under the section of things I haven’t really looked at yet.

However I do know that Erlang OTP applications are famous for being indestructible. I attribute this to its generic methods for error handling which can deal with unexpected failure modes. Basically you allow errors to kill a bunch of your actors until an error reaches some special actor who’s only job is to restart them and maybe notify other actors that they may need to resend messages. You may lose some intermediate computations, but you won’t have to restart the whole system. Things become more problematic if some actor performs I/O and you really need to know what side effects it had, but generally speaking the “turn it off and on again” approach works most of the time.

I suppose the actor model provides some nice primitives to work with when figuring out how to handle failure. They naturally isolate chunks of code and data. The messages can be recorded and reused in a way that would be difficult with, for example, function calls. This is nothing you can’t do some other way, but the actor model (if used correctly) provides some invariants which allow for a generic error handler to be used on any computation.

Frankly though I need to get my hands dirty with a few toy use cases which really needs to be robust (e.g. bank transactions) and bombard them with fault injection.

1 Like

Yes, actor models have a let it crash philosophy in which processes are structured in a tree structure (the supervision tree) which we can have parents as supervisors (processes that do nothing but handle eventual dead workers or other supervisors) and workers usually as leaves (processes that do something, which may fail for any reason). Once a worker fails, it is recreated in it’s initial state (everything it did is lost), so if you need it to recover the previous state state you need to keep it in another actor that does not crash as easily or in an external service (file system, database, kafka, redis…) which also protects from a full VM crash/restart.

The actor library is definitely responsible for maintaining the actor always alive (through the supervisor, and as long as the programmer opts for a restart strategy), but checkpoint and state recovery is the responsibility of the programmer/actor considering that each case is unique (serializing a flux model weights to the file system is completely different from updating a DB table based on a dataframe). The actor library is also responsible for maintaining the mailbox (and maybe for example allowing the process restart with the latest read message to avoid simply skipping it, as long as the actor is idempotent).

There might be some different considerations between Erlang’s actor model and a potential Julia actor model though. Erlang first priority is low latency, and no process can stay in execution for more than a little while once it starts running (since it’s immutable, the scheduler can safely interrupt any non dirty process at any point which compromises performance), while Julia programs focus on maximum throughput for cpu heavy tasks (which means less switching) and somewhat large data which might be inconvenient to simply pass as a message. Though Akka/Scala is in a similar scenario and does great (I don’t know much about it though), and actor model is a fantastic way to model stuff like web services which Julia could potentially do very well.

1 Like

Yes, actor models have a let it crash philosophy in which processes are structured in a tree structure

Not necessarily, this is perhaps how Erlang works, but the actor model in general would allow you to use any topology.

while Julia programs focus on maximum throughput for cpu heavy tasks … and somewhat large data which might be inconvenient to simply pass as a message.

Again this might be true for most current applications of Julia, but is not necessarily the case, with some (or a lot of) work it could be used in a soft-realtime system IMO. I don’t see any issue with the core language, just how Task cancellation, memory safety and related issues are handled (which is being worked on to some extent).

actor model is a fantastic way to model stuff like web services which Julia could potentially do very well.

Absolutely agree! It would be great to have actors running on the server and client (WASM). I had something like that working in Rust with Actix on the server and Yew on the client. It would have been nice to have Actix on both, but was still great.

Thank you very much @richiejp for luvvy, I appreciate your amazing effort. One small observation, though, is it luvvy with a small l? My understanding is that all package names should start with a capital letter, I also noticed that you changed the name from luvvie to luvvy which is better IMO, thanks again.

My understanding is that all package names should start with a capital letter

If this turns into a major package, and you judge it time to register it, @richiejp - the package name “Actors” is available again now, if you want it.

Yes, certainly, Actors is a really good name and you can contribute your ideas to the new package and unify your efforts. I’m sure the idea of actors is great, new languages are built around that concept like the Pony language.

the package name “Actors” is available again now, if you want it.

Thanks! OK, I will probably switch it to Actors(.jl), but I was thinking that maybe it would be a good thing to have two libraries, one which is quite conventional and allows you to use actors lightly and another which treats them more like a basic programming primitive.

Now I think it would make sense to have a core, zero-dependency library called Actors.jl and then some other packages which handle remote actors and whatever else. It appears the same core code can be used regardless of how much you wish to embrace the actor model.

I will await further feedback (or lack thereof) before renaming anything or registering though.

I’m sure the idea of actors is great, new languages are built around that concept like the Pony language .

Thanks for that. I have heard of it before, but never really looked at it.

Actually, Actors.jl was always intended to get remote actors as well, I just never got around to it. I wanted to build something like a fusion of Erlang and Scala/Akka actors, originally - so personally, if you’re going to build a fully featured, local+remote, thread+processes Actors framework incl. supervision, etc., feel free to grab the name “Actors”. I don’t think I’ll be able to contribute much for the forseeable future, unfortunately - too many other Julia projects going on, currently.

Thanks for this! I want to have a look in detail but I haven’t had the chance yet.

Being rather Erlang-naive, I do wonder how much of Erlang reliability is from low level design decisions made in the strict processes isolation model vs the use of Actors per se. One of the claims I’ve seen is that the OTP acts more like an operating system than the typical language runtime.

I suppose that the actor model is analogous to processes which communicate via sockets or micro-kernels using message passing. These things don’t automatically make your code robust, but they give you a clearly defined boundary where you know information is only passed via a limited number of methods. In the actor model this is strictly message passing. For processes on a modern OS you also have various types of shared memory and interrupts which complicate things (which I am sure you are well aware). However if you limit yourself to the actor model then you know you can take advantage of whatever isolation the underlying system uses although you still have to do a bunch of work to recover from an error. Probably I will simply aim to make the OS’s isolation features easily usable rather than reimplement these features in user land. For error recovery, there is simply just a lot of work to do there, probably most of the actor library will end being helper actors to deal with (re)starting, monitoring and distributing other actors.

Also in a strict implementation of the actor model, you have well defined state transitions which can be modeled as tuples of the actor state and message. These could be analysed and mutated to avoid ‘absorbing states’, albeit at great cost. So this might be another way which the OTP benefits from the actor model, but I’m not sure how much that is deployed in practice.

A few developments:

  • New address system which allows creating and assigning address-actor pairs on any thread (using a structure which allows address lookups while updating)
  • Created docs page: Guide · Actors
  • Support for asynchronous tasks which report failure as a message back to the actor which started them. So you don’t have to explicitly wait or fetch on a task.
  • Created a small POC web server (still at Richard Palethorpe / Luvvy · GitLab)

Some things I haven’t done:

  • Remote actors
  • Some kind of Actor hierarchy so that shutdown and failures are handled more gracefully (somewhat ironic given my previous comments)
  • Explicit support for timeouts
  • Message tracing

Now I am thinking about trying to use this to solve a real problem (or at least a problem which is not created by software in the first place). It is obviously not finished, but I am not sure what is really important for making it work or proving it’s a bad idea :smiley:

2 Likes

I implemented an actor hierarchy so that I have a way of dealing with actors which fail due to connection drops in a web app I am creating with Luvvy.

https://palethorpe.gitlab.io/Actors.jl/reference/#Actors.TreeMinder

This was done without modifying the actors themselves. Each actor already has a link to a ‘minder’, which they call to when they get into trouble or need a common service, so I created a hierarchy of minders. Potentially the user can replace the minders with their own implementation.

This also means shutdown is now less chaotic.

3 Likes