How to avoid repeated data movement between processes?

Do you mean that nm is different on each iteration? For CachingPool to be useful, I believe you need to be re-using a particular value of nm across all workers. You capture it in a closure and then repeatedly use that closure.

If you can construct nm on the worker processes themselves, that’s usually a good call (ie. maybe only one field of nm changes on each iteration so you only have to transfer that field?)

Finally, you have a couple of other options as well that are more explicit about the data movement:

# remotecall this function on all your workers and you can keep feeding it new
# jobs to work on through the `input` channel, but that big matrix only gets
# moved once when you first do the `remotecall`
function remote_processing_loop(input::RemoteChannel, output::RemoteChannel,
                                big_matrix_i_only_want_to_move_once)
    while true
        x = take!(input)
        x === nothing && break
        put!(output, process(x, big_matrix_i_only_want_to_move_once)
    end
end
# alternatively you can define a worker module and store the complex types that
# you don't want to move frequently as global variables
module Worker

function transfer_big_matrix(new_big_matrix)
    global big_matrix = new_big_matrix
end

function process(x)
    process(x, big_matrix)
end

function process(x, big_matrix)
    # do the data processing here
end

end
1 Like