Distributed, AllReduce, and Distributed Training

Continuing the discussion from Julia Distributed, AllReduce, and Distributed Training:

I’m working with Distributed Julia on multiple hosts and want to write some code that does distributed training with a machine learning model and distributed data. I don’t want to use MPI. I have the ClusterManager working correctly and can use addproc_mysystem(n) with out any issues.

The traditional way to do distributed training is to put an identical copy of the model on each worker, then partition the training data and give each worker one partition. Each worker will compute a gradient with respect to its portion of the training data, then call AllReduce(Add, gradients)/num_worker . The gradients form AllReduce are what’s used to update the model.

Question 1 is, what’s the best strategy to implement an AllReduce with Julia’s Distributed module?

Question 2, @everywhere Z=1 will let me declare the variable Z in each of the workers Main namespace. But the value of Z is not shared between works (right?). if I do @everwhere ch3 = RemoteChannel(()->Channel{Int}(10), 3) how does Julia know that ch3 in each of the workers namespace references the same underlying RemoteChannel?
As a more complicated example, in order to implement a RingAllReduce, I’m doing
@everywhere channelTable = [(RemoteChannel(()->Channel{Int}(10), w_i), RemoteChannel(()->Channel{Int}(10), w_i)) for w_i in workers()] . channelTable[2][1] indeed seems to reference the same Remote channel no matter where @spawnat 2 put!(channelTable[2][1], 7) and @spawnat 3 take!(channelTable[2][1]) are run. Is this really doing what I think it is? That is passing a value from worker 2 to worker 3.

Edit: I didn’t know how is move or link a post between categories. If someone wants to let me know the preferred way of doing this I’ll be happy to do that.

I’m currently trying to implement an equivalent to the MPI_Reduce operation in Julia. You may want to check out this similar thread. How to sum the chunks of a distributed array using a binary reduction tree?.

I’m not sure either if I used the correct section to post it…