Trying to understand communication between workers in simple version of pmap

Bruno_Amorim · December 21, 2019, 1:11am

I am trying to understand how different workers can communicate in Julia.

In the documentation of previous versions of Julia (<= v0.6) in the Parallel Computing section, a simple pmap function was given as an example. This simple pmap is (adapted to work in julia >= 1.0):

function simplepmap(fun, collection)

    n = length(collection)    
    results = Vector{Any}(undef, n)
    
    i = 1
    # function to produce the next work item from the queue.
    # in this case it's just an index.
    nextidx() = (idx=i; i+=1; idx)
    @sync for worker in workers()
        @async begin
            while true
                idx = nextidx()
                if idx > n
                    break
                end
                results[idx] = remotecall_fetch(fun, worker, collection[idx])
            end
        end
    end
    
    return results
end

In this code the role of the function nextidx is to increment the variable idx, thus moving to the next element of collection.

However, if we try to write the code as:

function simplepmap_wrong(fun, collection)
    
    n = length(collection)    
    results = Vector{Any}(undef, n)
    
    idx = 0
    @sync for worker in workers()
        @async begin
            while true
                idx += 1
                if idx > n
                    break
                end
                results[idx] = remotecall_fetch(fun, worker, collection[idx])
            end
        end
    end
    
    return results
end

This code fails.

For example if we try:

julia> addprocs(4)
julia> @everywhere f(x)=2*x
julia> xs = collect(1:10)
julia> simplepmap(f, xs) # works
julia> simplepmap_wrong(f, xs) # This Fails!: BoundsError: attempt to access 10-element Array{Any,1} at index [20]

However, in simplepmap_wrong, the line idx += 1 increases idx by one, the same thing that is performed by idx = nextidx() in simplepmap.

Why is there this difference in behaviour?

Thank you for your help!

(P.S.: I also noticed that the documentation for versions >1.0 still has a paragraph that explains the simple pmap function, even thought the code is no longer shown. Is this a relic from past versions of the documentation that should be corrected?)

Sijun · December 21, 2019, 8:44am

I think access to idx must be protected by local scope in order for the code work as intended:

let idx = idx
    results[idx] = remotecall_fetch(fun, worker, collection[idx])
end

Bruno_Amorim · January 2, 2020, 3:21pm

Thanks a lot to the suggestion @Sijun ! (Sorry for taking so long to provide feedback)

Indeed the code:

function simplepmap_fixed(fun, collection)
    
    n = length(collection)    
    results = Vector{Any}(undef, n)
    
    idx = 0
    @sync for worker in workers()
        @async begin
            while true
                idx += 1
                if idx > n
                    break
                end
                let idx = idx
                    results[idx] = remotecall_fetch(fun, worker, collection[idx])
                end
            end
        end
    end
    
    return results
end

works as intended. However, I do not understand why the variable idx must be protected. Is this kind of behaviour document somewhere, so that I can learn mote about it?

Thanks in advance!

Sijun · January 2, 2020, 4:00pm

Actually let keyword was not needed. You only need to capture the variable idx, which is shared among tasks, by another variable local to each task. So the following code would also work:

function simplepmap_fixed(fun, collection)
    
    n = length(collection)    
    results = Vector{Any}(undef, n)
    
    idx = 0
    @sync for worker in workers()
        @async begin
            while true
                idx += 1
                local i = idx # capture the value of idx by i
                if i > n
                    break
                end
                results[i] = remotecall_fetch(fun, worker, collection[i])
            end
        end
    end
    
    return results
end

In the above, the value of idx is being captured by a task-local variable i

In the original function simplemap(), idx = nextidx() is doing the same thing: also capturing the value of variable i into a task-local variable idx.

In the while loop of the function simplepmap_wrong(), the value of idx at point (2) may be different from that at point (1). Because, by the time remotecall_fetch() returns, the value of idx may have been modified by other process, then the expression result[idx] would access the array result at a location idx, which is now different from the past value of idx at the point (1) and this is not what you intend.

while true
    idx += 1 # point (1)
    if idx > n
        break
    end
    results[idx] = remotecall_fetch(fun, worker, collection[idx]) # point (2)
end

You may put @show in front of idx for debugging:

while true
    idx += 1 # point (1)
    @show idx
    if idx > n
        break
    end
    results[@show idx] = remotecall_fetch(fun, worker, collection[idx]) # point (2)
end

I think “Scope of Variable” section would help:
https://docs.julialang.org/en/v1/manual/variables-and-scoping/

Bruno_Amorim · January 3, 2020, 11:39am

Thank you very much for the explanation! I will look into the scoping rules.

Topic		Replies	Views
Requesting idle workers to speed up unbalanced processes with pmap General Usage pmap	9	1509	March 21, 2018
Lack of improvement from distributed pmap, understanding a simple example New to Julia distributed , pmap	6	155	October 29, 2024
Pmap() data inputs New to Julia	4	310	March 20, 2023
Understanding message passing with pmap Performance	3	459	June 1, 2022
Making code and packages available to workers inside module General Usage parallel , module , distributed	4	790	March 2, 2021

Trying to understand communication between workers in simple version of pmap

Related topics