A good coroutines usage example (especially for v0.6)?

I am quite new to the concept of coroutines. I have a situation in which I think it is appropriate to use them, and I’m sure I can figure out how to do that, but I’m not so sure I can figure out how to do that right. Is there a package out there which would serve as a really good example for working with coroutines? An example for v0.6 in which produce and consume are deprecated would be especially nice, as Channels seem to change things a bit. Thanks in advance.

Addendum: Also, I am quite curious as to whether there are concrete plans to allow tasks to be scheduled on multiple threads. That seems to have some pretty big implications for the usefulness of coroutines.

2 Likes

Coroutines have a lot of usage scenarios. Producer-consumer pattern (represented by produce() and consume() functions earlier or tasks and channels now, or by generators in Python, for example) is one of the most popular. Another one is using @sync / @async. Let’s say, we want to fetch 100 web pages. We can do it like this:

urls = ...
pages = Array(100)
for (i, url) in enumerate(urls)
    pages[i] = fetch_page(url)
end

This way we will first download 1st page, then 2nd, then 3rd, etc., each time spending, say, 100ms, or 10 seconds altogether. Clearly, this task is IO-bound, while CPU is almost not used.

Now let’s change the code a bit:

urls = ...
pages = Array(100)
@sync for (i, url) in enumerate(urls)
    @async pages[i] = fetch_page(url)
end

Now each page is downloaded in its own task, and tasks are switched every time it gets to IO operation. Instead of waiting for 100ms for the 1st page to be downloaded, Julia switches to the 2nd task/page almost immediately, then to 3rd, etc. The whole job ends in, say, 200ms instead of 10 seconds, So you get a huge performance boost without multithreading.

5 Likes

Thanks for your response and example.

I think this stuff makes a lot of sense to me when tasks are scheduled on separate processes or threads, but something is really nagging me in cases where this is all done on a single thread.

So, in your example above, I feel like the compiler (or at least the program at runtime) has to somehow know that fetch_page is going to contain some sort of IO that it’s going to have to wait for. I guess my question is, how does the runtime know that it can move on to the next call of fetch_page? Is there some sort of trigger that tells it “we are doing IO now, please move on”? If so, can it figure this out entirely on its own, or do you have to give it some sort of hint (beyond just writing @async)? And, even more confusing, how does it know when it can resume an old task? I’m assuming it doesn’t interrupt any tasks in order to resume old ones, but must wait until they are finished.

By the way, there is an example of this using channels in the 0.6 documentation, but in it they use sleep. I happen to know that sleep blocks tasks, so by my reckoning, in that example they are explicitly telling the program to move on to the next iteration, so that makes sense. It seems to suggest to me I’d have to use wait or something inside of fetch_page in your example above.

Edit: I just realized that there was one other thing that was happening that was confusing the shit out of me. When I was writing test programs to figure this stuff out, my printlns were blocking! This means that sometimes I would schedule a task, it would start trying to print a line, but actually it would move on to another part of code and print another line before it finished! This was causing me to think that all sorts of strange voodoo was happening.

This is pretty much how it works. I don’t know all the details, but Julia is based on libuv - an asynchronous IO library integrated with task framework. So the example with fetch_page works something like this (provided that fetch_page uses usual Julia network API):

  1. Inside the first call to @async, Julia creates a task to run fetch_page(urls[1]).
  2. fetch_page() runs until IO operation is requested; libuv starts the operation and returns immediately, informing the task framework that it can switch to the next task.
  3. Julia switches back to the main task (where a for loop is running) and starts the next task.
  4. When the result of the first IO operation is ready, libuv informs Julia that it can continue with the first task.

I may be wrong in the details, but you should get a general idea.

Compare it with the code that doesn’t include any IO (or other “switch points”):

julia> A = rand(1000, 1000)
julia> B = rand(1000, 1000)
julia> @sync for i=1:10 
               @async begin 
                     for j=1:100 A * B end
                     println("finished task $i")
              end 
       end

finished task 1
finished task 2
finished task 3
finished task 4
finished task 5
finished task 6
finished task 7
finished task 8
finished task 9
finished task 10

Not only all tasks strictly follow each other, but also run time doesn’t decrease (in fact, it even increases a bit because of task switching).

Ah, that makes a lot of sense.

So, I take it that if I do some sort of IO that is not calling libuv, I would have to do some work to make it “coroutine friendly”?

One more thing that I’m a bit confused at is that I’m seeing a bunch of examples with tasks bound to channels which run functions with while true ... end blocks. I think what’s going on is that at some point (when they are “done”) they become permanently blocked. I’m wondering if something needs to be done to clean them up. Perhaps they get “garbage collected” when then channel goes out of scope, but would I have to be extremely careful about this sort of thing if there are multiple tasks attached to a channel? If those functions are closures or something I can imagine them causing “memory leaks” (not in the usual sense of being a bug, but still being undesirable).

Thanks again!

Use file descriptors and wait on them.

I believe the rules here are the same as for any other objects: until an object is reachable from the root, neither it, not its dependencies may be garbage collected. So you should definitely think about GC, but not more than for other objects.

By the way, you may be interested to go through channels implementation - it’s actually pretty straightforward and doesn’t have any magic.

Thanks, I think I’m getting a much better sense of how this all works now. My remaining point of trepidation is that I feel like I don’t know which functions will cause blocking (i.e. pausing a task) and which won’t. (Also I’m still slightly worried about memory leaks.)

To anybody else trying to figure this out in the future: be careful interpreting the results of println! Before I realized that they were often printing asynchronously I was getting very, very confused! (It definitely seemed like “magic” at the time.)

One more question, is there some reason !(RemoteChannel <: AbstractChannel)? That doesn’t seem right…

RemoteChannel and Future are remote references and can be thought of as handles to channels. A RemoteChannel is a reference to a Channel (or a implementation of an AbstractChannel) on a specific node.

This is very interesting. Can you post an example on using file descriptors and wait on them?

2 Likes