Trying to understand parallel performance in Turing

Since more than 50% of the time is spent in GC, one possible reason for the performance difference between Threads and Distributed could be that Distributed allows GC to run in parallel while GC in any thread stops all threads. The memory allocations could also be a limiting factor in the parallelization speedup. This should not be a problem in a bigger model where more time is spent in computation instead of GC.

1 Like