I’m trying to improve the threaded performance of some code: I’ve written up the associated MWE.
Essentially we have a matrix
m, which is mutated in two consecutive double for loops.
The first loop:
The first for loop iterates over a subset of rows, and all the columns. For any given row of the subset of rows any column of of the matrix may be mutated (based on a random draw for the column index).
The second loop:
This is more straightforward in that it iterates over all rows and columns: for all rows
m[i,j]in a way that depends on the value of
m[i,j]at the end of the first loop.
The MWE runs the code via the straightforward
method1, which has two consecutive nested double for loops, and then by
@spawns individual iterations of the first loop, which are
waited for in the second loop.
If anyone can beat the performance of this, let me know!
On my machine (1st time printed is
method1, second time is
$for i in 1 2 4 8;do echo $i;JULIA_NUM_THREADS=$i julia ThreadProblemMWE.jl;done 1 79.999939 seconds (201.69 k allocations: 5.648 MiB) 61.961386 seconds (201.78 k allocations: 5.657 MiB) 2 39.999903 seconds (201.70 k allocations: 5.649 MiB) 31.959529 seconds (201.78 k allocations: 5.658 MiB) 4 20.999882 seconds (201.72 k allocations: 5.652 MiB) 16.960056 seconds (201.79 k allocations: 5.657 MiB) 8 11.504882 seconds (201.76 k allocations: 5.657 MiB) 9.460095 seconds (201.80 k allocations: 5.660 MiB)