Question to the experts on distributed computing:
How should we design distributed reinforcement learning methods in julia?
Ray has a nice approach with different optimizers for methods that move around gradients or batches of experience, syncronoulsy or asyncronously.
As a super simple toy example of a method where agents collect experience on their own threads, send them to the learner and receive the updated policy I wrote the following:
using Distributed
@everywhere begin
function actor(in, out)
while true
isdone, policy = take!(in)
action = randn() + policy
reward = 2 < action < 3
put!(out, (myid(), action, reward))
isdone && return
end
end
end
function learner(; policyinit = 0., η = 1e-2, T = 10^4)
in = [RemoteChannel(()->Channel{Tuple{Bool,Float64}}(1)) for _ in 1:nprocs() - 1]
out = [RemoteChannel(()->Channel{Tuple{Int,Float64,Bool}}(1)) for _ in 1:nprocs() - 1]
for i in 1:nprocs() - 1
remote_do(actor, i+1, in[i], out[i])
end
policy = policyinit
for step in 1:T
for i in 1:nprocs() - 1
put!(in[i], (step == T, policy))
id, action, reward = take!(out[i])
policy += η * reward * (action - policy)
end
end
policy
end
learner()
- Does this look like a reasonable approach?
- This works for CPUs. Can I follow a similar pattern with GPUs?
- I have no clue about the performance of
RemoteChannel
. Do you think it will be possible to get competitive performance with this approach as compared e.g. to ray?