@schedule considered harmful

design

#1

I read this article from hacker news about
asyncronicity

today and decided to see how well the concepts there mapped to scheduled Tasks
in Julia (I haven’t had occasion to use Tasks myself yet). To my
gratification, the approach to asyncronicity advocated in the article already
one of the most natural ways to do async in Julia.

For those who don’t have time to read the article, it basically argues that
scheduling a Task/coroutine to run on a “background” thread (@schedule in
Julia) is a misfeature in much the same way as goto is a misfeature: both, if
they occur inside a function break assumptions programmers usually make about
the effects of calling a function in a way that cannot be distinguished at the
call-site. As an example in Julia, consider if somewhere deep in the dark
crevasses of a library you’re using, a function schedules something that never
yields; since Julia has a single global scheduler (afaict), anything you
scheduled after that would never run.

So if we accept that ad-hoc scheduling is a misfeature, what’s the minimum set
of changes we’d need to make to Julia to get rid of it while still allowing
powerful asyncronicity?

  1. Deprecate @schedule and schedule for use in library code.
  2. Change @async so that it only schedules Tasks when inside a @sync
    block.
  3. (Optional) Implement error propagation as described in the linked article.

As far as I can tell, that’s it (I’m sure there’s something I’m missing). Any
thoughts? Is there something you can currently do with @schedule that couldn’t
be done by wrapping it up in a @sync block? If so, should that be done?


#2

Interesting idea - I remember seeing Martin Sústrik’s libdill a while ago which has a similar architecture, and it seems easier to reason about.

One thing that it seems you wouldn’t get by using @async and @sync for these blocks is a scoped context for task synchronization (his nurseries). For example, you mention changing @async so it only works within a @sync block, but what if I just take a big ball of task launching and have:

@sync run_my_program()

Then it seems like you lose most of the advantages he describes.

It would be cool to see a package (TaskContexts.jl?) that implemented these ideas without needing language changes. Obviously you wouldn’t get any of the guarantees because somebody else’s code could @schedule at any time, but it would be a good way to demo that the idea is viable, and see if other packages would adopt it.

One place I see this approach being problematic is with interactive use at the REPL, where you might want to spawn a background task and then hand control back to the user. For example, when I open up a stream to the sound card I launch a background task that services the periodic OS audio callback and handles any ongoing record/playback operations. Likewise the plotting packages pop up a window and need to launch background tasks to handle user input. I’m not sure how that would work in a structured concurrency context without blocking the REPL.


#3

Caveat: I’m far from an expert on this subject. I use @async blocks for I/O, and I don’t think a sync block is needed (or desirable) in this case. For instance, if I’m reading STDOUT from an external process, and I’m uncertain of whether there will be any data at all to read, I wrap that function in an @async block (see here for an actual example). I’m not sure what an outer @sync block will buy me in this case.

I may be misunderstanding the issue, but it seems like the argument is that @async can block a program from completing. To me this sounds like a category of bugs that are to be avoided (by yield()ing inside anything run @asyncronously), but is not fundamentally worse than any other kind of bug that can have the same effect (such as a while(True) with no break).


#4

That’s true. With my proposal, you’d basically get the equivalent of explicitly passing around the nursery object in trio, but everywhere. Outside of @sync blocks, you’d still get the benefit of knowing that calling a function won’t spawn some background thread you don’t know about, but inside a @sync block, reasoning would be just as difficult and non-local as today. So what we want is probably a function only defined inside of @sync blocks (since we can’t really pass around a macro, though the syntax for such would be cleaner) that has to be explicitly passed around to spawn new tasks.

I think you’re right that if we want to push this as the standard
task-scheduling paradigm in Julia Base or stdlib, it needs to be developed more
thoroughly as an independent library first; there’s more complexity to the
changes than I had originally thought, and I think the error-handling paradigm
is more important than I originally estimated. I’ll place developing something
like that into my project backlog; maybe I’ll get to it by 2022…

On the note of interactive use, to be honest, I don’t really see why scheduling
a task interactively without explicitly spawning a new thread or process for it
is something people should expect to work.


#5

the argument is that @async can block a program from completing

The argument is that @async or schedule() or @schedule change behavior of
a function call in a way that isn’t transparent from the function call-site.

As an individual writing code, you’re definitely not gaining any expressive
power by being forced to wrap asynchronously scheduled code inside a synchronous
block–just like modern languages don’t gain expressive power by not having
goto or by limiting its use in some pretty fundamental ways.

The argument is that if the entire language adopted this approach as the only
one it used for scheduling tasks, reasoning about the effect of a particular
function call becomes a lot easier: you know that when a normal function
returns, all tasks that were scheduled in that function have already completed.
Basically, in return for giving up some basic expressive capabilities of the
language, you gain the ability to operate at a higher level of abstraction and
more easily/confidently compose functions. Does that help explain the argument
I’m making?

Note that I haven’t looked at your particular use-case, but if it would help, I
can take a look this weekend and suggest how it might be done in the paradigm
I’m advocating.


#6

I had a long conversation with Nathaniel (Trio author) about this architecture (while we were wandering around New Orleans at a conference in October) and a really like the design—knowing that every task finishes before a function returns, being able to abort I/O tasks, the nice error handling properties. It’s all really good and I’m glad he wrote it all up (and implemented it for Python), because I couldn’t remember all the details. I’d love to explore this kind of structured approach in Julia. Making it mandatory would have to be 2.0 material since it’s breaking but we can always allow people to opt into it before then.


#7

After thinking more about it, I think I now see the purpose of this architecture, and it is indeed very interesting and appealing.

I’m not able to visualize how to apply it to blocking I/O problems, though. Say you have a game that reads the keyboard for input, and you implement it like this, Julia-style, in pseudocode:

create a channel;

@async while True
        k = read keyboard # this blocks until a key is pressed
        put!(k,channel)  # put the key in the channel
    end
end

# main game loop
function run_game()
    while True
        do game stuff;
        yield();  # give the async task a chance to run (a goto in disguise!)
        if channel is not empty
            k = take!(channel)
            take action required by k
        end
    end
end

run_game();

With this approach, the code to read the keyboard runs asynchronously but I don’t expect it to ever complete! If I put it in a nursery inside run_game, then the game will block forever. Maybe the solution is to put both in a “top-level” nursery?


#8

Yep. If you had something that needed to run indefinitely, you’d wrap it in a top-level nursery. This has a couple advantages:

  1. It’s more explicit about what’s going on. If someone wanted to run run_game asynchronously with other code, they’d have to explicitly wrap it in their own nursery.

  2. Cancellation and errors in general now work as expected.


#9

Alright, I can see the benefits. Thanks!