Pmap with in place functions

gideonsimpson · April 25, 2018, 1:53am

I was hoping to use pmap with an in place function, but I’m finding the behavior is not as expected. This example illustrates the issue:

addprocs(2);
@everywhere function f!(x)
    @. x+= randn();
    x
end

X = [zeros(2) for i=1:10];
srand(100);
pmap(f!,X);

When I inspect X after running, I find that it still contains all zeros, which is not what I expected. If I don’t add any processes, and only have worker 1, it does work as I had hoped.

tbeason · April 25, 2018, 5:22pm

The way I overcame this is to have an intermediate function that first constructs the arrays and then calls the inplace function - and that is what I pass to pmap. Which really isn’t in place at all, but it does fine for my use case.

gideonsimpson · April 25, 2018, 7:07pm

That sort of defeats the purpose of the inplace operation. The whole reason I have the inplace function to begin with is to avoid more allocations.

standarddeviant · May 11, 2018, 9:49pm

I’m in a very similar situation. For the sake of simplicity, I’d like to not make my code in to a module, and I want one function in the file to call the other in a parpool/pmap fashion.

I thought an easy way to do this would be to simply include the file with @everywhere, but this has proved to be tricky. My current attempt is failing.

I have a file named pmap_test.jl :

# pmap_test.jl

function do_the_work(x)
    x * 2
    sleep(1)
end

function do_the_pmap()
    num_workers = 8
    wpool = WorkerPool(addprocs(num_workers)) # make worker pool

    # make all functions in this file available to all workers
    @everywhere the_source_file = realpath(Base.source_path())
    @everywhere include(the_source_file)

    # use pmap with wpool
    pmap(wpool, do_the_work, collect(1:num_workers*3))

    # attempt to remove worker procs associated with wpool
    rmprocs(wpool.workers)
end

When I try to include this file and call do_the_pmap from a notebook, I get the following:

On worker 2:
SystemError: realpath: No error
#systemerror#44 at .\error.jl:64 [inlined]
systemerror at .\error.jl:64
realpath at .\path.jl:287
eval at .\boot.jl:235
eval_ew_expr at .\distributed\macros.jl:116
#106 at .\distributed\process_messages.jl:268 [inlined]
run_work_thunk at .\distributed\process_messages.jl:56
macro expansion at .\distributed\process_messages.jl:268 [inlined]
#105 at .\event.jl:73
#remotecall_fetch#141(::Array{Any,1}, ::Function, ::Function, ::Base.Distributed.Worker, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:354
remotecall_fetch(::Function, ::Base.Distributed.Worker, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:346
#remotecall_fetch#144(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:367
remotecall_fetch(::Function, ::Int64, ::Expr, ::Vararg{Expr,N} where N) at .\distributed\remotecall.jl:367
(::##25#29)() at .\distributed\macros.jl:102

...and 40 more exception(s).


Stacktrace:
 [1] sync_end() at .\task.jl:287
 [2] macro expansion at .\distributed\macros.jl:112 [inlined]
 [3] do_the_pmap() at pmap_test.jl:13

Is there a straightforward way to include a simple .jl file with functions and make those functions usable on all workers?

Elrod · May 12, 2018, 12:29am

Is there a straightforward way to include a simple .jl file with functions and make those functions usable on all workers?

@everywhere include("simple_jl_file_with_functions.jl")

If you’re desperate to avoid allocations and want to do in place operations, I’d try threading instead of pmap. But I wouldn’t expect that to be straightforward either. You don’t want separate processes writing into regions of memory that are close to each other.

gideonsimpson · May 12, 2018, 5:39pm

Won’t threading limit the approach the algorithm to a shared memory environment?

Also, I’m really surprised that a multiprocessing/multithreading application of an in place function to, appropriately defined, data structures is not straightforward.

SebastianM-C · May 12, 2018, 10:48pm

I think that you should use SharedArrays when writing to an array from multiple processes. See https://docs.julialang.org/en/stable/manual/parallel-computing/#man-shared-arrays-1 and https://docs.julialang.org/en/stable/manual/parallel-computing/#Parallel-Map-and-Loops-1

Elrod · May 13, 2018, 2:49am

I’d also look into DistributedArrays:

standarddeviant · May 13, 2018, 12:47pm

I actually just want to write separate files to disk in each process. It’s “embarrassingly parallel.”

gideonsimpson · May 15, 2018, 9:15pm

I’ve heard this suggestion of SharedArrays before, but no one seems to have an example of using them together with pmap or some other parallel mechanism for such an in place manipulation.

SebastianM-C · May 15, 2018, 9:58pm

The second link (https://docs.julialang.org/en/stable/manual/parallel-computing/#Parallel-Map-and-Loops-1) has an example with this and explains why your example does not work (each process has a separate copy of the array).
The idea is to replace the Array you are writing to from multiple processes with a SharedArray.

Topic		Replies	Views
Pmap in function/module New to Julia	3	1591	July 1, 2019
Adding data to worker processes via @everywhere General Usage	12	1200	January 11, 2019
Pmap() data inputs New to Julia	4	315	March 20, 2023
`pmap` with shared arrays in side a custom type? General Usage question	2	901	November 18, 2016
@everywhere and pmap inside of a function? Julia at Scale question , parallel	14	3824	August 17, 2017

Pmap with in place functions

Related topics