Which one is better to make inference fast and why? Does they have any effect if you are using single chain?
In my experience, they have similar performance. You can run a benchmark to verify for your use case. Using a single distributed or threaded chain will not provide a performance advantage because sampling is performed serially within a chain.
I have used the two methods, for the same model. My casual observation makes me think that Distributed is somewhat faster. I believe that garbage collection on one thread slows all, while the same is not true for different workers. I will try to do a more careful comparison before long.
1 Like
At the moment I also see similar results as you said.