I have a question about multithreading. I have a vector of pairwise thread safe problems v, say of length 16, and I have four threads.
As I understand it, if I write
Threads.@theads for i in 1:16
do_problem(v[i])
end
this will assign thread 1 to v[1…4], thread 2 to v[5…8], etc.
My problem is that in fact problems v[1…4] are much bigger than the rest, and some of the problems further along are empty, so this assignment of threads is more or less the worst possible.
Thus my question: is there an equivalent piece of code that would instead thread the problem above so that the threads would work their way along the vector of problems taking which ever next one was free. Thus initially threads one to four work on v[1] to v[4], and then when the first thread, maybe the one on v[3], is finished it would move on to v[5], then the next thread to come free would take v[6], etc.
Probably you need task-based parallelism with @spawn
Compare:
for i in 1:16; Threads.@spawn begin sleep(i > 8 ? 4 : 1); @show (i, Threads.threadid()) end end
Threads.@threads for i in 1:16; sleep(i > 8 ? 4 : 1); @show (i, Threads.threadid()) end
To be honest, I haven’t read yet how task scheduler in Julia works, so maybe someone else will give more input, but @spawn seems to be way to go.
@Sean_Matthews - I have been fighting the same issue. You are correct - Threads.@threads distributes via order. Threads.@spawn is closer, but has a subtlety that if one of the heavy threads hits the primary thread, funny things can happen on the result collection side. I’ve only just learned about Transducers, so I’m going to take a look there.
I do have a non-registered package, ThreadPools.jl, that I’ve been using and works pretty well. All you have to do is replace @threads with @bgthreads, and you’ll get the more optimized load behavior, but you’ll lose the primary thread.
It has a couple of example demos that visually demonstrate the differences between the methods - for anything approaching uniformity, the existing methods are best, but for a heavily-nonuniform workload, keeping the scheduler on the primary thread and the workers on background threads is a win. Example demo output (snapshot of which of 150 jobs, some of which are 10x the work, is running on each thread):
I still need to improve the API for the “active stack” case, though - it works well in my use case, but needs wrapping in a simpler interface. And if I look at Transducers and decide it gets close, I’ll just do PR’s in that direction.
Interesting package, I think I have use case in my project for it. Originally I planned to communicate with julia’s process from some external app in a different language, but after getting familiar with features from 1.3 I started thinking about writing a webapi service in julia that will schedule calculations as threads and will track their progress itself.
Cool. The package is currently in the registration-wait cycle, and should come out in a couple of days. I’m adding some features I need (result iterator, optional job logging for tuning purposes, and simple wrapper for the active stack case) for v0.2.0, hopefully in the next week. For my oddball use case, anyway, it has been a huge time-saver.
Don’t quite have the hang of this website yet (It has just told me that I should not post lots of separate notes - which makes sense) - anyway just to say thanks for the feedback, all of which is useful.
@Sean_Matthews, here is an activity log excerpt from my current project that behaves exactly as you describe, since I am in the middle of this right now. Here, the columns are the threads (there are 4, with the primary being used for management), and the numbers are the task id being worked on at that moment. (I’ve annotated the headers.) You’ll notice that some threads get stuck on a long job, while others will crank through the short ones until they also hit a long one. For example, in the first time step, Thread 2 gets through two jobs (1642 and 1643), ending at 1644 while Threads 3 and 4 are stuck on 1637 and 1641, respectively. In the next time step, 1645 and 1646 are taken on by Threads 3 and 2, while Thread 4 is still working 1641. And so on. This was achieved by just using ThreadPools.@bgthreads instead of Threads.@threads. @fgthreads keeps the primary in play, and is usually faster except for really nonuniform loading (like I have).