Looks like your chip has a maximum memory bandwidth of 32 GB/s. You’re churning through an 80 MB array, so you have a hard floor of 2.5 ms, and as you approach that floor, it becomes more difficult to wring out additional performance gains by throwing more processors at a problem. Try a more expensive calculation:
julia> @btime sum(a -> tan(a), $a)
155.060 ms (0 allocations: 0 bytes)
6.156371337551294e6
julia> @btime sum(a -> tan(a), $adist)
75.691 ms (310 allocations: 12.44 KiB)
6.156371337551294e6
(results from a dual-core, hyperthreaded i5-6267U with four workers)