Very poor performance with DistributedArrays?

Looks like your chip has a maximum memory bandwidth of 32 GB/s. You’re churning through an 80 MB array, so you have a hard floor of 2.5 ms, and as you approach that floor, it becomes more difficult to wring out additional performance gains by throwing more processors at a problem. Try a more expensive calculation:

julia> @btime sum(a -> tan(a), $a)
  155.060 ms (0 allocations: 0 bytes)
6.156371337551294e6

julia> @btime sum(a -> tan(a), $adist)
  75.691 ms (310 allocations: 12.44 KiB)
6.156371337551294e6

(results from a dual-core, hyperthreaded i5-6267U with four workers)

2 Likes