I’m planning to use Julia as the language in my intro to parallel computing course next year. Earlier this month I posted some notes on GPU programming with Julia . Now I’ve just uploaded to my website some notes on multithreading with Julia, which may be of interest to others.
Spawning one task per column is (at least in my tests) even faster than your recursive spawning approach (and the code is simpler):
juliaSetCalcThread2!(pic, c, n) = wait.([Threads.@spawn calcColumn!(pic, c, n, j) for j in 1:n]) @btime juliaSet(-0.79,0.15,1000,juliaSetCalcRecSpawn!); # 8.016 ms (154 allocations: 997.05 KiB) @btime juliaSet(-0.79,0.15,1000,juliaSetCalcThread2!); # 6.307 ms (6007 allocations: 1.80 MiB)
it is usually better to clump into less spawns if the computations are fairly uniform in time. That said, you can’t always do this. Package A might be multithreaded and call Package B which is multithread, and this same mechanism makes that not run too many concurrent threads. So the use case for “many spawns” is usually going to be interactions between codes, or cases where it’s hard to split the work into equal runtimes (in that case, spawn a bunch of tasks and let the scheduler handle the load).
That’s an interesting comparison. I’m still fairly new to Julia (used to C/Cilk and C/OpenMP) and it’s a general principle that the number of tasks is a tradeoff between load balancing and scheduler overhead. The partr scheduler works very well!
You may use
CuArrays.@time otherwise you won’t time the GPU execution time… or maybe not
That’s why I called
synchronize() at the end of my subset sum functions (in http://www.cs.unb.ca/~aubanel/JuliaGPUNotes.html), to make sure the GPU completed its work