Yeah, with @threads
there seems to always be some reason to have to manually split up the workload over the threads.
- In this case it’s because you want each of the threads to reduce to their own local variable before summing the results of the threads together. Similarly in Parallel is very slow - #17 by Elrod.
- Another case: Parallelizing for loop in the computation of a gradient - #18 by saschatimme (perhaps not as frequently encountered as the first, but still)
Maybe @threads
in its current form is just not the right abstraction? At the very least, a mapreduce
-style version of @threads
similar to @parallel
would be pretty useful I think.