Scaling of @threads for "embarrassingly parallel" problem

For future reference of other readers here:
Everyone who starts in a parallel computing course is told the golden rule to first squeeze the most performance out of your serial code and then to parallelize that.
Allocations have to do syscalls which inevitably hand over control to the operating system at unpredictable moments and are thus a problem in hot loops.
I agree that tooling can be improved to specify faster where such allocations happen, but perhaps this @deallocate path is not the best way forward.

7 Likes