Pmap slowdown in Julia 0.5?

parallel

#1

I’m having an issue where parallel code works as intended in Julia 0.4, but in Julia 0.5 appears to slow down over time. Specifically, we have a function forecast_one which does computations in blocks. In Julia 0.4, the time per block remains constant after precompiling. After moving to Julia 0.5, the time per block appears to increase linearly as more blocks are computed.

For context, we’re running this function using 10 worker processes on a Linux-based cluster. From watching top on a given compute node, we’ve observed the following:

  • Memory usage is constant and very low over processes and nodes over time, as expected
  • In the beginning, the originator process is relatively not busy, while the worker processes are busy nearly 100% of the time
  • Over time, the worker processes’ CPU usage declines while the master process’s increases, in conjunction with the observed slowdown
  • Asymptotically, the worker processes are completely idle

I understand that pmap was refactored quite a bit between Julia 0.4 and 0.5, and I’m wondering if any of those changes are contributing to the worsening performance for us now in Julia 0.5. Besides making small necessary changes to make our code 0.5-compatible, nothing else is different between the versions of our code run in 0.4 and 0.5.

Unfortunately, I’ve so far been unable to produce a minimal working example illustrating this issue.

Some pseudo-code for forecast_one is below (the actual code is here):

function forecast_one(model, ...)
    for block = 1:nblocks
        # Read from HDF5 file
        parameter_draws = load_draws(model, block)

        # Call pmap
        forecast_output = pmap(param -> forecast_one_draw(model, param, ...), parameter_draws)
        gc()

        # Write to JLD file
        write_forecast_output(forecast_output)
    end
end

The function being pmapped over, forecast_one_draw, basically does a lot of matrix multiplication and linear algebra (but doesn’t involve any IO). It returns a Dict{Symbol, Array{Float64}} where the arrays are large-ish (max 84 x 229 x 24).

cc: @abhig94 @emoszkowski


#2

I encountered a similar issue one time with Julia .4. In my case, the problem seemed to be a memory leak or failure of garbage collection. The issue seemed to resolve itself when I upgraded to .5. Does your memory usage max out?


#3

No, unfortunately, the memory usage seems to be low and constant over time.