Being more explicit about the data movement is generally my preference as well, but it sounds like you might be able to get away with the CachingPool
here.
Something like this might work for you:
foo(k) = bar(nm, k) # capture nm in a closure
pool = CachingPool(workers())
@sync for worker in workers()
@async while length(queue) > 0
idx = pop!(queue)
results[idx] = remotecall_fetch(foo, pool, ks[idx])
end
end