Threading in Julia

question

#1

I have an application where I want to launch two Julia functions as separate threads. Those threads have a shared matrix they each read from and add extra columns to. Any ideas as to how to go about this?

One major concern I had was something I read in the documentation (see the end of this post) saying that OS level threading is not supported - all tasks are just switched between on the main thread. This is not what I want, particularly because my application is compute-bound.

An approach I’ve been looking at is using remotecall (or @spawnat or @spawn) and fetch, but the call to fetch is blocking, which is not what I want. Do I need to use @async with the fetch call? Also, once one thread is completed, I don’t need the results from the other thread, so I don’t want to have to wait for both threads to complete with @sync.

Another approach I saw would be the LocalCluster manager. This looks like it would be closer to what I want (if the condition variables talked about are analogous to those for p-threads), but I haven’t found any examples of how to use this.

I’ve also looked at using the Julia MPI package. Can the MPI package launch Julia code on separate threads if used locally? Or does it only work with a cluster of machines?

And now it seems like there is an experimental implementation of threading mentioned in the documentation but of course there aren’t any examples of using @threadcall. And it seems to have the same problem where it’s not actually launching new threads, just switching between multiple tasks on the same thread.

Then there are Tasks (aka coroutines) which also seem to just be switching on the same thread.

Finally, I saw a SharedArray class which looked like it would be helpful, but I found no way to add extra rows or columns to it. Is there a way to add extra rows or columns to a SharedArray and have the changes be reflected on all the threads?

What are your suggestions to approaching this problem? And please provide a small example if you can. Thanks!

Relevant Documentation:
All I/O tasks, timers, REPL commands, etc are multiplexed onto a single OS thread via an event loop. A patched version of libuv (http://docs.libuv.org/en/v1.x/) provides this functionality. Yield points provide for co-operatively scheduling multiple tasks onto the same OS thread. I/O tasks and timers yield implicitly while waiting for the event to occur. Calling yield() explicitly allows for other tasks to be scheduled.

@async is similar to @spawn, but only runs tasks on the local process. We use it to create a “feeder” task for each process. Each task picks the next index that needs to be computed, then waits for its process to finish, then repeats until we run out of indexes. Note that the feeder tasks do not begin to execute until the main task reaches the end of the @sync block, at which point it surrenders control and waits for all the local tasks to complete before returning from the function. The feeder tasks are able to share state via nextidx() because they all run on the same process. No locking is required, since the threads are scheduled cooperatively and not preemptively. This means context switches only occur at well-defined points: in this case, when remotecall_fetch() is called.


#2

If you need real threading (and from the thing you want to do it seems that you do), then none of @async, @spawn, @sync, @parallel, @threadcall are relavant.

The only experimental feature that’s related is @thread.


#3

Hi

I’m also keenly watching threading’s evolution, however the impression I get is that threading is going into a different direction than I was hoping for. I might be wrong, so please correct me if anyone knows better. The different direction might even be a good thing and has started me trying to rethink my traditional approach. (My usecase for wanting to use threads was some threads dedicated to network comms, while other threads are doing the computations.)

My impression is that the developers are trying to stay away from all the problems of threads related to memory contention, requiring mutexes, semaphores etc. Thus threads are being designed to just work on parallel execution of the same function, where they hopefully not step on each others toes.
Add to that complexities of having to have multiple JIT engines running in parallel…

Anyway back to your question:

  • Matrixes are not really designed to grow dynamically. Trying to add anything, will cause the whole matrix to be reallocated, with old data being copied. This is going to be slow and probably not what you really want.
  • I’m not sure I understand what your intention is, since you mention you don’t care about the other thread’s answer… Is that after adding one colum or adding multiple columns?
  • Are you sure you need threads? Why don’t you want to use Julia’s parallel processes approach? The different built in methods or MPI should be able to allow you the needed comms between those processes.
  • It almost sounds like you have to algorithms to calculate an answer and want to run both in parallel and use the one which gives a faster answer. Guess it could work. But that opens more questions. Do you want to do this multiple times? If so, you probably need to think about being able to interrupt the slower one in order to ask it to go onto the next iteration. This could affect which comms mechanism is the easiest to use.

I guess most of that comes down to better understanding the details of your situation. And asking the question whether there are alternative ways to solving the problem.


#4

No.

No. There’s no such limitation now, after all, this is not even well defined… There are a lot of limitations and that’s why it’s still experimental.

FWIW, this does not happen automatically.


#5

Good to know, I misunderstood the direction things are going.

On Matrix growth: I probably wasn’t fully clear, thus thanks for making it more explicit. I know it doesn’t happen automatically (unlike Matlab) and whether it happens automatically (hidden from your view) or done by you manually it would not be a good design. Pre-allocating to the big size and then just populating later, along with some flags to indicate which columns are now valid, might be one way to do it, but it didn’t seem like that was the OPs intension.