The following outline of an example would have helped me as I was trying to learn just enough of Julia’s parallel processing capabilities to speed up the generation of about 6000 frames in a 3D plot animation, which was my first and only (so far) Julia programming task.
# file: msld.jl (My Slow Loop Distributed)
# import Pkg.add("Distributed")
using Distributed
nProc = 8 # loop partitioned over this number of CPU cores
addprocs(nProc, exeflags="--project=.")
@everywhere begin
include("msl.jl") # initialize each instance of msl
end
pmap(1:nProc) do i
msl(0, i, nProc)
end
# file: msl.jl (My Slow Loop)
# some initialization so msl() can be run from REPL
# (executed when running include("msl.jl"))
# an example include initialization:
# using Plots, Printf
# pyplot()
# msl (My Slow Loop)
# - myArgs: whatever
# - iProc: 1-based index of CPU core
# - nProc: number of CPU cores used to partition slow loop
function msl(myArgs=0, iProc=1, nProc=1)
# some (possibly lengthy) initial calculations
# ...
nLoop = 20
# ...
# partition my slow loop among nProc CPU cores
# NOTE: The following calculation of loop start
# and stop indexes defaults to the non-partitioned
# loop and also correctly handles the unlikely case
# nProc >> nLoop by not running the loop at all when
# iLoop2 < iLoop1.
iLoop1 = Int64(round((1 + iProc-1)*nLoop/nProc))
iLoop2 = Int64(round(iProc*nLoop/nProc))
# execute loop (partitioned or not)
# - use default values of iProc and nProc for unpartitioned loop
# - be sure loop iterations can be performed in any order
for iLoop = iLoop1:iLoop2
# some probably lengthy calculation
end
end
I was not able to use multi-threading, described in Julia’s parallel computing documentation as “usually the easiest way to get parallelism”, because I used PyPlot for its 3D plotting capabilities and a PyPlot instance is not reentrant because (apparently) its implementation stores some state information in global variables.
However, using Distributed as in the above example outline has been a breeze to implement and I suspect a particularly slow loop that would benefit from being distributed over several CPU cores is a common pattern. Specifically about the ease of implementation, the ability to swap back and forth between running the non-distributed application (for debugging and optimizing by running include(“msl.jl”); msl()) and the distributed application (for faster production runs by running include(“msld.jl”)), without any changes to code, is a great convenience.
For those interested in seeing more than an example outline, the full Julia code is in the appendix of the (still evolving) essay at Zeno’s Enduring Example
and the resulting 3D plot animation of a complex Fourier series tracing the letter ‘e’ is at https://www.youtube.com/watch?v=pTajZtdz5ns
Concerning timing, porting the application from GNU Octave to Julia 1.7.0 resulted in a 5x increase in speed (23.3 minutes versus 120 minutes), and Julia 1.7.0 using Distributed accounted for another 2x increase in speed (11.1 minutes versus 23.3 minutes) for a total 10x increase in speed. Thank you Julia designers and developers!
I do have a question though: the Caveats section of Julia’s documentation about multi-threading gives the warning “Avoid running top-level operations, e.g. include
, or eval
of type, method, and module definitions in parallel.” As far as I can tell, my application is working. What problem is caused by running a top-level include in parallel?