How to avoid repeated data movement between processes?

Michael_Eastwood · March 21, 2018, 4:15pm

Do you mean that nm is different on each iteration? For CachingPool to be useful, I believe you need to be re-using a particular value of nm across all workers. You capture it in a closure and then repeatedly use that closure.

If you can construct nm on the worker processes themselves, that’s usually a good call (ie. maybe only one field of nm changes on each iteration so you only have to transfer that field?)

Finally, you have a couple of other options as well that are more explicit about the data movement:

# remotecall this function on all your workers and you can keep feeding it new
# jobs to work on through the `input` channel, but that big matrix only gets
# moved once when you first do the `remotecall`
function remote_processing_loop(input::RemoteChannel, output::RemoteChannel,
                                big_matrix_i_only_want_to_move_once)
    while true
        x = take!(input)
        x === nothing && break
        put!(output, process(x, big_matrix_i_only_want_to_move_once)
    end
end

# alternatively you can define a worker module and store the complex types that
# you don't want to move frequently as global variables
module Worker

function transfer_big_matrix(new_big_matrix)
    global big_matrix = new_big_matrix
end

function process(x)
    process(x, big_matrix)
end

function process(x, big_matrix)
    # do the data processing here
end

end

Topic		Replies	Views
How to avoid large movement of data when using remotecall_fetch? Julia at Scale question	0	514	March 9, 2020
How to avoid shipping large object to remote workers repeatedly General Usage distributed	1	206	February 26, 2023
What causes data movement when using remotecall_fetch? General Usage distributed	2	389	June 30, 2020
Avoiding JIT Compilation of Remotecall Functions Julia at Scale question	2	591	October 31, 2018
How do computation while transferring data between processes Julia at Scale	0	576	September 5, 2017

How to avoid repeated data movement between processes?

Related topics