I’m having an issue where parallel code works as intended in Julia 0.4, but in Julia 0.5 appears to slow down over time. Specifically, we have a function forecast_one
which does computations in blocks. In Julia 0.4, the time per block remains constant after precompiling. After moving to Julia 0.5, the time per block appears to increase linearly as more blocks are computed.
For context, we’re running this function using 10 worker processes on a Linux-based cluster. From watching top
on a given compute node, we’ve observed the following:
- Memory usage is constant and very low over processes and nodes over time, as expected
- In the beginning, the originator process is relatively not busy, while the worker processes are busy nearly 100% of the time
- Over time, the worker processes’ CPU usage declines while the master process’s increases, in conjunction with the observed slowdown
- Asymptotically, the worker processes are completely idle
I understand that pmap
was refactored quite a bit between Julia 0.4 and 0.5, and I’m wondering if any of those changes are contributing to the worsening performance for us now in Julia 0.5. Besides making small necessary changes to make our code 0.5-compatible, nothing else is different between the versions of our code run in 0.4 and 0.5.
Unfortunately, I’ve so far been unable to produce a minimal working example illustrating this issue.
Some pseudo-code for forecast_one
is below (the actual code is here):
function forecast_one(model, ...)
for block = 1:nblocks
# Read from HDF5 file
parameter_draws = load_draws(model, block)
# Call pmap
forecast_output = pmap(param -> forecast_one_draw(model, param, ...), parameter_draws)
gc()
# Write to JLD file
write_forecast_output(forecast_output)
end
end
The function being pmap
ped over, forecast_one_draw
, basically does a lot of matrix multiplication and linear algebra (but doesn’t involve any IO). It returns a Dict{Symbol, Array{Float64}}
where the arrays are large-ish (max 84 x 229 x 24).
cc: @abhig94 @emoszkowski