How to efficiently handle parallelism on a DistributedArray

Actually I find that there are 2 most time consuming parts in my work:

1 Like