How to design distributed (CPU/GPU) reinforcement learning methods?



Question to the experts on distributed computing:
How should we design distributed reinforcement learning methods in julia?
Ray has a nice approach with different optimizers for methods that move around gradients or batches of experience, syncronoulsy or asyncronously.

As a super simple toy example of a method where agents collect experience on their own threads, send them to the learner and receive the updated policy I wrote the following:

using Distributed
@everywhere begin
    function actor(in, out)
        while true
            isdone, policy = take!(in)
            action = randn() + policy
            reward = 2 < action < 3
            put!(out, (myid(), action, reward))
            isdone && return

function learner(; policyinit = 0., η = 1e-2, T = 10^4)
    in = [RemoteChannel(()->Channel{Tuple{Bool,Float64}}(1)) for _ in 1:nprocs() - 1]
    out = [RemoteChannel(()->Channel{Tuple{Int,Float64,Bool}}(1)) for _ in 1:nprocs() - 1]
    for i in 1:nprocs() - 1
        remote_do(actor, i+1, in[i], out[i])
    policy = policyinit
    for step in 1:T
        for i in 1:nprocs() - 1
            put!(in[i], (step == T, policy))
            id, action, reward = take!(out[i])
            policy += η * reward * (action - policy)

  1. Does this look like a reasonable approach?
  2. This works for CPUs. Can I follow a similar pattern with GPUs?
  3. I have no clue about the performance of RemoteChannel. Do you think it will be possible to get competitive performance with this approach as compared e.g. to ray?