How to distribute array by column?

Background:

using Distributed
@everywhere using DistributedArrays
addprocs(4)

A=[1 1 2 2; 
   1 1 2 2; 
   3 3 4 4;
   3 3 4 4]
adist = distribute(A)

I found A is distributed in a checkerboard block way, which means

fetch( @spawnat 2 localpart(adist))

will return

[1 1;
 1,1]

But I want A to be distributed by column, which means

fetch( @spawnat 2 localpart(adist))

will return

[1,
 1,
 3,
 3]

How can I define the way of distribution? (eg. only by column)

Thanks,

You can just specify it with the dist keyword argument to specify the number of partitions per dimension:

julia> adist = distribute(A; dist=(1, nworkers()))
4Ă—4 DArray{Int64,2,Array{Int64,2}}:
 1  1  2  2
 1  1  2  2
 3  3  4  4
 3  3  4  4

julia> fetch( @spawnat 2 localpart(adist))
4Ă—1 Array{Int64,2}:
 1
 1
 3
 3

(As an aside, note that you want to @everywhere using after addprocsing)

1 Like

I tested that

adist2 = distribute(A; dist=(4, 1))  #by row
adist3 = distribute(A; dist=(1, 4))  #by column
adist4 = distribute(A; dist=(2, 2))  #checkerboard

The documentation said “dist optionally specifies a vector or tuple of the number of partitions in each dimension”.

But I still feel confused about the two parameters in dist(parameter1, parameter2)

Could you please explain with more detail? Thanks for your time.

The dist tuple has as many elements as there are dimensions in A. It specifies how many partitions (or splits) should be used within each dimension.

The first dimension is rows — and there we don’t want any partitions, so we use a 1.

The second dimension is the columns — and there we want as many partitions as there are workers.

2 Likes

I got it. Thank you very much!