I was trying to learn distributed array. And I can distribute the arrays using distribute, however I can’t work out how to get the localparts of arrays.
As a MWE here what I am trying to do
addprocs()
@everywhere using DistributedArrays
a = rand(1:5, 1_000_000)
b= rand(1_000_000)
@time da= distribute(a)
@time db= distribute(a)
sum(da) #these work fine
sum(db)
# I want to something like
@everywhere res = localpart(da) .* localpart(db)
But it says da and db are not defined on the workers, which makes sense, so how can define da and db in the workers?
My actual use cases uses much more complicated algorithms, but I would lilke to work with two or more distributed vectors.
Also just realised, wouldn’t it cause an issue where the vectors are distributed differently on different workers? I need finer control over how I distribute the vectors too then.
1- if you just do res=da.*db, then res is a distributed array with its localparts corresponding to what I think you want.
2- the message that you are seeing has to do with @everywhere. from the help:
@everywhere bar=1
will define Main.bar on all processes.
Unlike @spawn and @spawnat, @everywhere does not capture any local variables. Prefixing @everywhere with @eval allows us to broadcast local variables using interpolation :
foo = 1
@eval @everywhere bar=$foo
So if you really wanted res to be the same variable name everywhere, this will do:
The reason why distributed arrays are useful is because we can parallelize operations? Does res=da.*db auto-parallelize? The solution on how to do this is not clear.
Actually my use-case was alot more complicated than .*. I guess I have to learn alot more about distributed arrays before I can do what I want with it.