I am using Julia PyPlot to do postprocessing for simulation data. Here is the sample script for my parallel work:
using Distributed
@everywhere using PyCall
@everywhere matplotlib = pyimport("matplotlib")
@everywhere matplotlib.use("Agg")
@everywhere using PyPlot, Glob
@everywhere function process(filename::String, dir::String=".")
np = pyimport("numpy");
filehead, data, filelist = readdata(filename, dir=dir, verbose=false);
# Postprocessing...
plt.savefig("$(time).png")
println("finished saving $(time)!")
end
# Define path and filenames
dir = ".";
filename = "y*.out";
# Find filenames
# ......
# Processing
@distributed for filename in filenames
println("filename: $(filename)")
process(filename, dir)
end
In this way, I find all the filenames on one processor, distribute the names within workers, and do the plotting and saving on each worker using @distributed. Since there’s no dependency between processing different files, this seems to work.
I am wondering if there are better ways to do this, say, using @threads or channel? Any idea is appreciated!
Later I encountered some issue when running this script in the command line but not in REPL. In REPL mode, everything looks fine, the plots are saved in png format. However, if I just type
julia process.jl
Then the function process is never been executed, and no error message returned. What’s wrong with that? I feel like the scheduled task in the queue is never been executed.
The driver script is sending the tasks off to workers and exiting immediately, because the distribute for loop returns immediately. This in turn kills the workers immediately. Doesn’t happen on the REPL because that stays alive after you run each command.
You need to have your driver script wait for the workers to all finish, by for example doing a dummy reduce.
The docs for @distributed say that if you give it a reducer it’ll wait for the workers, so that it can compute the reduction. So you can have each worker return something, say 1, and use + as the reducer.
Yeah it only works for v1.3
I have a small benchmark in the Readme indicating roughly where the overhead caused by my implementation strategy starts becoming noticeable.
If I understand the single global lock on libuv correctly, that means that interacting with it is done on a privileged thread, but the actual IO can happen in parallel? If not then for IO heavy tasks threads wouldn’t help…