How to access the localpart of a distributed array?

xiaodai · October 26, 2017, 9:44pm

I was trying to learn distributed array. And I can distribute the arrays using distribute, however I can’t work out how to get the localparts of arrays.

As a MWE here what I am trying to do

addprocs()
@everywhere using DistributedArrays

a = rand(1:5, 1_000_000)
b= rand(1_000_000)

@time da= distribute(a)
@time db= distribute(a)

sum(da) #these work fine
sum(db)

# I want to something like
@everywhere res = localpart(da) .* localpart(db)

But it says da and db are not defined on the workers, which makes sense, so how can define da and db in the workers?

My actual use cases uses much more complicated algorithms, but I would lilke to work with two or more distributed vectors.

Also just realised, wouldn’t it cause an issue where the vectors are distributed differently on different workers? I need finer control over how I distribute the vectors too then.

raminammour · October 26, 2017, 10:41pm

Two things:

1- if you just do res=da.*db, then res is a distributed array with its localparts corresponding to what I think you want.
2- the message that you are seeing has to do with @everywhere. from the help:

@everywhere bar=1

will define Main.bar on all processes.

Unlike @spawn and @spawnat, @everywhere does not capture any local variables. Prefixing @everywhere with @eval allows us to broadcast local variables using interpolation :

  foo = 1
  @eval @everywhere bar=$foo

So if you really wanted res to be the same variable name everywhere, this will do:

@eval @everywhere res=localpart($da).*localpart($db)

xiaodai · October 27, 2017, 9:15pm

The reason why distributed arrays are useful is because we can parallelize operations? Does res=da.*db auto-parallelize? The solution on how to do this is not clear.

ChrisRackauckas · October 27, 2017, 9:58pm

Yes and yes, but it needs help.

github.com

JuliaParallel/DistributedArrays.jl/blob/7d213890aec31fef29364a4dc2a60cffb0faaf5c/src/mapreduce.jl#L23


      
          
          Base.Broadcast.promote_containertype(::Type{DArray}, ::Type{DArray}) = DArray
          Base.Broadcast.promote_containertype(::Type{DArray}, ::Type{Array})  = DArray
          Base.Broadcast.promote_containertype(::Type{DArray}, ct)             = DArray
          Base.Broadcast.promote_containertype(::Type{Array}, ::Type{DArray})  = DArray
          Base.Broadcast.promote_containertype(ct, ::Type{DArray})             = DArray
          
          Base.Broadcast.broadcast_indices(::Type{DArray}, A)      = indices(A)
          Base.Broadcast.broadcast_indices(::Type{DArray}, A::Ref) = ()
          
          # FixMe!
          ## 1. Support for arbitrary indices including OneTo
          ## 2. This is as type unstable as it can be. Overhead might not matter too much for DArrays though.
          function Base.Broadcast.broadcast_c(f, ::Type{DArray}, As...)
              T     = Base.Broadcast._broadcast_eltype(f, As...)
              shape = Base.Broadcast.broadcast_indices(As...)
              iter  = Base.CartesianRange(shape)
              D     = DArray(map(length, shape)) do I
                  Base.Broadcast.broadcast_c(f, Array,
                      map(a -> isa(a, Union{Number,Ref}) ? a :
                          localtype(a)(a[ntuple(i -> i > ndims(a) ? 1 : (size(a, i) == 1 ? (1:1) : I[i]), length(shape))...]), As)...)

raminammour · October 27, 2017, 10:21pm

Yes, as Chris said. Not everything is implemented or optimal but for simple operations it is.
The easiest way to check

@which da.*db

If it is not implemented, falls back on AbstractArray implementation (which can be slow).

In this case you will see that it is implemented and does what you want, efficiently.

xiaodai · October 27, 2017, 10:22pm

Actually my use-case was alot more complicated than .*. I guess I have to learn alot more about distributed arrays before I can do what I want with it.

raminammour · October 27, 2017, 11:57pm

A more general want to code with them is as follows:

da=DArray(...)
for ip in procs(da)
   @spawnat ip begin
        Do things with local parts or access other parts (entail communication)
End 
End

xiaodai · October 27, 2017, 11:59pm

How do I refer to da in each of the subprocesses?

raminammour · October 28, 2017, 5:37am

If you use the spawnat macro, then just da.

@spawnat procs(da)[2] sum(localpart(da))

Would work, for example. @spawnat interpolates, unlike @everywhere.

Topic		Replies	Views
DistributedArrays: unexpected behavior before modifying localpart Julia at Scale	1	565	November 29, 2017
Bug in DistributedArrays? Pushing to first local part pushes to all local parts General Usage bug , distributed	1	323	April 8, 2022
Adding vs multiplying matrices with DistributedArrays General Usage distributed	7	594	May 25, 2021
DArray local part via global index range? Performance question	0	240	January 13, 2021
Unable to call a distributed array in a parallel subprocess General Usage parallel , distributed	1	469	April 7, 2020

How to access the localpart of a distributed array?

Related topics