That doesn’t seem to match what the documentation says. See https://docs.julialang.org/en/stable/manual/parallel-computing/#Parallel-Map-and-Loops-1, last paragraph:
@parallel for
can handle situations where each iteration is tiny, perhaps merely summing two numbers.
When I was trying out Julia’s parallel computing capabilities a few weeks ago, I was also surprised by a lack of performance, both with @parallel
and with @threads
. For @threads
, it turned out to be completely due to the closure performance issue (see Parallelizing for loop in the computation of a gradient - #7 by tkoolen). I didn’t look into the details of @parallel
, but does it also generate a closure?