With this relatively inexpensive task, threads is almost 4x faster than serial, and distributed is about 2x faster. I was surprised that the remote channels version is a little slower than pmap. I anticipated it would be faster since the work functions are continuously running, while pmap launches new functions at each step of the outer loop. However, I don’t have a great understanding of distributed computing. Is there something sub-optimal about what I’ve written?
I took a look at your notebook and it doesn’t seem to have the remoteChannel version you refer to. I’m curious to know why the pmap version of your computation might be faster than the remoteChannel version. One complete guess is that maybe the pmap version is utilizing a caching pool while the remoteChannel version is not, so the remoteChannel version is re-sending some piece of data multiple times while the pmap version only has to do so once? Just a complete guess but might be worth looking into.
Sorry, I had updated that gist and it looks like I took out the RemoteChannel part. I brought back the old version and updated the post.
I’m not using a CachingPool in the pmap part, unless it automatically uses one. I don’t think there is any extra data in the closure anyways. All the data is either in the module or explicitly transferred.
The speed difference is only about 10% anyway, so it’s not critical. I had just hoped the RemoteChannel would beat pmap and get the performance closer to that of threads.