@distributed is not sharing the work as I would expect

Hi

I have a monte carlo simulation which I run with 289 different parameter sets.

I now realize that @distributed is not as smart as I thought (at least that is my current understanding).
The issue is that not every iteration takes the same amount of time.
As you can see in the log below, at the end of the ‘work’ only two of the workers (No 2 and 3) are busy, and 13 are idle.

Which of the many packages (or concepts in Base) would be a easy replacement for @distributed but with smarter scheduling.

Edit: it seems the work is shared evenly (counting the number of elements in the for loop) across workers as shown by this table.

worker count
2 20
3 20
4 20
5 20
6 19
7 19
8 19
9 19
10 19
11 19
12 19
13 19
14 19
15 19
16 19
Total 289
#my code is roughly this:
keymetrics = Distributed.@distributed (vcat) for i in 1:sz
t0 = Dates.now()
    #do a few things     
    keym = deriveMetricsForThisResult!(simRes,gameToStringDict,refLosses,sett,sensmatinput,i,selectedKeymFunction,number_of_reinstatements,lim,ret,aggregateCover_perDrawRetention,tmpoutputfldr)
 
        ela = round((Dates.now() - t0),Dates.Second).value
        println(string("$(ela) seconds - Sensitivity $(i)/$(sz) finished"))

        #keym is the return value for vcat operation
        keym
end
julia> @time senskeym = simulateSensitivities(mainsett,sensmatinput[1:end,:],selectedKeymFunction,number_of_reinstatements,lim,ret,aggregateCover_perDrawRetention,outputfld);
[ Info: Running sensitivites...
[ Info: iterations = 1000000
[ Info: Number of scenarios = 289
[ Info: Running base model...
 62.175956 seconds (1.42 M allocations: 65.257 MiB, 0.03% gc time)
[ Info: Base model has finished.
      From worker 12:   19 seconds - Sensitivity 195/289 finished
      From worker 8:    20 seconds - Sensitivity 119/289 finished
      From worker 9:    20 seconds - Sensitivity 138/289 finished
      From worker 13:   21 seconds - Sensitivity 214/289 finished
      From worker 14:   21 seconds - Sensitivity 233/289 finished
      From worker 15:   27 seconds - Sensitivity 252/289 finished
      From worker 16:   28 seconds - Sensitivity 271/289 finished
      From worker 11:   33 seconds - Sensitivity 176/289 finished
      From worker 10:   33 seconds - Sensitivity 157/289 finished
      From worker 5:    39 seconds - Sensitivity 61/289 finished
      From worker 8:    19 seconds - Sensitivity 120/289 finished
      From worker 12:   19 seconds - Sensitivity 196/289 finished
      From worker 9:    20 seconds - Sensitivity 139/289 finished
      From worker 13:   20 seconds - Sensitivity 215/289 finished
      From worker 14:   20 seconds - Sensitivity 234/289 finished
      From worker 6:    41 seconds - Sensitivity 81/289 finished
      From worker 15:   27 seconds - Sensitivity 253/289 finished
      From worker 16:   28 seconds - Sensitivity 272/289 finished
      From worker 12:   19 seconds - Sensitivity 197/289 finished
      From worker 8:    19 seconds - Sensitivity 121/289 finished
      From worker 9:    19 seconds - Sensitivity 140/289 finished
      From worker 13:   20 seconds - Sensitivity 216/289 finished
      From worker 14:   20 seconds - Sensitivity 235/289 finished
      From worker 5:    23 seconds - Sensitivity 62/289 finished
      From worker 6:    23 seconds - Sensitivity 82/289 finished
      From worker 11:   32 seconds - Sensitivity 177/289 finished
      From worker 10:   32 seconds - Sensitivity 158/289 finished
      From worker 8:    19 seconds - Sensitivity 122/289 finished
      From worker 12:   20 seconds - Sensitivity 198/289 finished
      From worker 9:    19 seconds - Sensitivity 141/289 finished
      From worker 13:   20 seconds - Sensitivity 217/289 finished
      From worker 14:   20 seconds - Sensitivity 236/289 finished
      From worker 15:   28 seconds - Sensitivity 254/289 finished
      From worker 16:   28 seconds - Sensitivity 273/289 finished
      From worker 5:    23 seconds - Sensitivity 63/289 finished
      From worker 6:    23 seconds - Sensitivity 83/289 finished
      From worker 7:    91 seconds - Sensitivity 100/289 finished
      From worker 8:    20 seconds - Sensitivity 123/289 finished
      From worker 12:   20 seconds - Sensitivity 199/289 finished
      From worker 11:   33 seconds - Sensitivity 178/289 finished
      From worker 9:    20 seconds - Sensitivity 142/289 finished
      From worker 10:   33 seconds - Sensitivity 159/289 finished
      From worker 13:   20 seconds - Sensitivity 218/289 finished
      From worker 14:   20 seconds - Sensitivity 237/289 finished
      From worker 4:    100 seconds - Sensitivity 41/289 finished
      From worker 3:    104 seconds - Sensitivity 21/289 finished
      From worker 5:    22 seconds - Sensitivity 64/289 finished
      From worker 7:    19 seconds - Sensitivity 101/289 finished
      From worker 6:    22 seconds - Sensitivity 84/289 finished
      From worker 15:   28 seconds - Sensitivity 255/289 finished
      From worker 16:   28 seconds - Sensitivity 274/289 finished
      From worker 8:    20 seconds - Sensitivity 124/289 finished
      From worker 12:   20 seconds - Sensitivity 200/289 finished
      From worker 9:    20 seconds - Sensitivity 143/289 finished
      From worker 13:   20 seconds - Sensitivity 219/289 finished
      From worker 14:   20 seconds - Sensitivity 238/289 finished
      From worker 7:    19 seconds - Sensitivity 102/289 finished
      From worker 5:    22 seconds - Sensitivity 65/289 finished
      From worker 11:   33 seconds - Sensitivity 179/289 finished
      From worker 10:   33 seconds - Sensitivity 160/289 finished
      From worker 6:    22 seconds - Sensitivity 85/289 finished
      From worker 8:    19 seconds - Sensitivity 125/289 finished
      From worker 15:   27 seconds - Sensitivity 256/289 finished
      From worker 12:   19 seconds - Sensitivity 201/289 finished
      From worker 9:    20 seconds - Sensitivity 144/289 finished
      From worker 13:   20 seconds - Sensitivity 220/289 finished
      From worker 16:   28 seconds - Sensitivity 275/289 finished
      From worker 14:   20 seconds - Sensitivity 239/289 finished
      From worker 7:    19 seconds - Sensitivity 103/289 finished
      From worker 5:    22 seconds - Sensitivity 66/289 finished
      From worker 6:    23 seconds - Sensitivity 86/289 finished
      From worker 8:    19 seconds - Sensitivity 126/289 finished
      From worker 12:   19 seconds - Sensitivity 202/289 finished
      From worker 9:    20 seconds - Sensitivity 145/289 finished
      From worker 13:   20 seconds - Sensitivity 221/289 finished
      From worker 14:   20 seconds - Sensitivity 240/289 finished
      From worker 11:   32 seconds - Sensitivity 180/289 finished
      From worker 15:   27 seconds - Sensitivity 257/289 finished
      From worker 10:   33 seconds - Sensitivity 161/289 finished
      From worker 16:   27 seconds - Sensitivity 276/289 finished
      From worker 7:    20 seconds - Sensitivity 104/289 finished
      From worker 5:    22 seconds - Sensitivity 67/289 finished
      From worker 8:    19 seconds - Sensitivity 127/289 finished
      From worker 12:   20 seconds - Sensitivity 203/289 finished
      From worker 2:    177 seconds - Sensitivity 1/289 finished
      From worker 4:    77 seconds - Sensitivity 42/289 finished
      From worker 6:    24 seconds - Sensitivity 87/289 finished
      From worker 13:   20 seconds - Sensitivity 222/289 finished
      From worker 3:    76 seconds - Sensitivity 22/289 finished
      From worker 14:   20 seconds - Sensitivity 241/289 finished
      From worker 7:    19 seconds - Sensitivity 105/289 finished
      From worker 9:    33 seconds - Sensitivity 146/289 finished
      From worker 15:   28 seconds - Sensitivity 258/289 finished
      From worker 8:    20 seconds - Sensitivity 128/289 finished
      From worker 12:   20 seconds - Sensitivity 204/289 finished
      From worker 11:   33 seconds - Sensitivity 181/289 finished
      From worker 16:   30 seconds - Sensitivity 277/289 finished
      From worker 10:   34 seconds - Sensitivity 162/289 finished
      From worker 5:    24 seconds - Sensitivity 68/289 finished
      From worker 13:   21 seconds - Sensitivity 223/289 finished
      From worker 6:    23 seconds - Sensitivity 88/289 finished
      From worker 7:    20 seconds - Sensitivity 106/289 finished
      From worker 14:   27 seconds - Sensitivity 242/289 finished
      From worker 8:    20 seconds - Sensitivity 129/289 finished
      From worker 12:   21 seconds - Sensitivity 205/289 finished
      From worker 15:   28 seconds - Sensitivity 259/289 finished
      From worker 13:   21 seconds - Sensitivity 224/289 finished
      From worker 5:    23 seconds - Sensitivity 69/289 finished
      From worker 16:   28 seconds - Sensitivity 278/289 finished
      From worker 9:    34 seconds - Sensitivity 147/289 finished
      From worker 7:    20 seconds - Sensitivity 107/289 finished
      From worker 6:    24 seconds - Sensitivity 89/289 finished
      From worker 11:   33 seconds - Sensitivity 182/289 finished
      From worker 10:   34 seconds - Sensitivity 163/289 finished
      From worker 14:   28 seconds - Sensitivity 243/289 finished
      From worker 8:    21 seconds - Sensitivity 130/289 finished
      From worker 12:   20 seconds - Sensitivity 206/289 finished
      From worker 13:   21 seconds - Sensitivity 225/289 finished
      From worker 5:    24 seconds - Sensitivity 70/289 finished
      From worker 7:    20 seconds - Sensitivity 108/289 finished
      From worker 15:   29 seconds - Sensitivity 260/289 finished
      From worker 2:    74 seconds - Sensitivity 2/289 finished
      From worker 6:    25 seconds - Sensitivity 90/289 finished
      From worker 16:   29 seconds - Sensitivity 279/289 finished
      From worker 4:    79 seconds - Sensitivity 43/289 finished
      From worker 8:    21 seconds - Sensitivity 131/289 finished
      From worker 12:   21 seconds - Sensitivity 207/289 finished
      From worker 3:    78 seconds - Sensitivity 23/289 finished
      From worker 9:    33 seconds - Sensitivity 148/289 finished
      From worker 11:   33 seconds - Sensitivity 183/289 finished
      From worker 13:   21 seconds - Sensitivity 226/289 finished
      From worker 14:   28 seconds - Sensitivity 244/289 finished
      From worker 10:   34 seconds - Sensitivity 164/289 finished
      From worker 7:    20 seconds - Sensitivity 109/289 finished
      From worker 5:    22 seconds - Sensitivity 71/289 finished
      From worker 6:    23 seconds - Sensitivity 91/289 finished
      From worker 8:    21 seconds - Sensitivity 132/289 finished
      From worker 12:   20 seconds - Sensitivity 208/289 finished
      From worker 15:   29 seconds - Sensitivity 261/289 finished
      From worker 16:   29 seconds - Sensitivity 280/289 finished
      From worker 13:   21 seconds - Sensitivity 227/289 finished
      From worker 7:    20 seconds - Sensitivity 110/289 finished
      From worker 5:    23 seconds - Sensitivity 72/289 finished
      From worker 14:   29 seconds - Sensitivity 245/289 finished
      From worker 9:    34 seconds - Sensitivity 149/289 finished
      From worker 11:   34 seconds - Sensitivity 184/289 finished
      From worker 6:    23 seconds - Sensitivity 92/289 finished
      From worker 8:    20 seconds - Sensitivity 133/289 finished
      From worker 12:   21 seconds - Sensitivity 209/289 finished
      From worker 10:   35 seconds - Sensitivity 165/289 finished
      From worker 13:   21 seconds - Sensitivity 228/289 finished
      From worker 7:    21 seconds - Sensitivity 111/289 finished
      From worker 15:   29 seconds - Sensitivity 262/289 finished
      From worker 16:   29 seconds - Sensitivity 281/289 finished
      From worker 5:    23 seconds - Sensitivity 73/289 finished
      From worker 8:    21 seconds - Sensitivity 134/289 finished
      From worker 12:   20 seconds - Sensitivity 210/289 finished
      From worker 14:   28 seconds - Sensitivity 246/289 finished
      From worker 6:    24 seconds - Sensitivity 93/289 finished
      From worker 13:   21 seconds - Sensitivity 229/289 finished
      From worker 2:    74 seconds - Sensitivity 3/289 finished
      From worker 9:    34 seconds - Sensitivity 150/289 finished
      From worker 7:    19 seconds - Sensitivity 112/289 finished
      From worker 11:   34 seconds - Sensitivity 185/289 finished
      From worker 10:   34 seconds - Sensitivity 166/289 finished
      From worker 15:   28 seconds - Sensitivity 263/289 finished
      From worker 3:    78 seconds - Sensitivity 24/289 finished
      From worker 4:    81 seconds - Sensitivity 44/289 finished
      From worker 5:    24 seconds - Sensitivity 74/289 finished
      From worker 8:    20 seconds - Sensitivity 135/289 finished
      From worker 12:   20 seconds - Sensitivity 211/289 finished
      From worker 16:   30 seconds - Sensitivity 282/289 finished
      From worker 6:    24 seconds - Sensitivity 94/289 finished
      From worker 13:   21 seconds - Sensitivity 230/289 finished
      From worker 7:    19 seconds - Sensitivity 113/289 finished
      From worker 14:   29 seconds - Sensitivity 247/289 finished
      From worker 8:    20 seconds - Sensitivity 136/289 finished
      From worker 12:   20 seconds - Sensitivity 212/289 finished
      From worker 9:    34 seconds - Sensitivity 151/289 finished
      From worker 5:    24 seconds - Sensitivity 75/289 finished
      From worker 11:   33 seconds - Sensitivity 186/289 finished
      From worker 15:   28 seconds - Sensitivity 264/289 finished
      From worker 7:    20 seconds - Sensitivity 114/289 finished
      From worker 13:   21 seconds - Sensitivity 231/289 finished
      From worker 10:   34 seconds - Sensitivity 167/289 finished
      From worker 6:    24 seconds - Sensitivity 95/289 finished
      From worker 16:   28 seconds - Sensitivity 283/289 finished
      From worker 14:   29 seconds - Sensitivity 248/289 finished
      From worker 8:    20 seconds - Sensitivity 137/289 finished
      From worker 12:   20 seconds - Sensitivity 213/289 finished
      From worker 5:    23 seconds - Sensitivity 76/289 finished
      From worker 7:    20 seconds - Sensitivity 115/289 finished
      From worker 13:   21 seconds - Sensitivity 232/289 finished
      From worker 15:   27 seconds - Sensitivity 265/289 finished
      From worker 9:    33 seconds - Sensitivity 152/289 finished
      From worker 6:    24 seconds - Sensitivity 96/289 finished
      From worker 11:   32 seconds - Sensitivity 187/289 finished
      From worker 16:   28 seconds - Sensitivity 284/289 finished
      From worker 2:    73 seconds - Sensitivity 4/289 finished
      From worker 10:   32 seconds - Sensitivity 168/289 finished
      From worker 14:   25 seconds - Sensitivity 249/289 finished
      From worker 7:    19 seconds - Sensitivity 116/289 finished
      From worker 5:    21 seconds - Sensitivity 77/289 finished
      From worker 3:    76 seconds - Sensitivity 25/289 finished
      From worker 4:    77 seconds - Sensitivity 45/289 finished
      From worker 6:    23 seconds - Sensitivity 97/289 finished
      From worker 15:   27 seconds - Sensitivity 266/289 finished
      From worker 7:    18 seconds - Sensitivity 117/289 finished
      From worker 16:   26 seconds - Sensitivity 285/289 finished
      From worker 9:    32 seconds - Sensitivity 153/289 finished
      From worker 11:   31 seconds - Sensitivity 188/289 finished
      From worker 5:    21 seconds - Sensitivity 78/289 finished
      From worker 14:   26 seconds - Sensitivity 250/289 finished
      From worker 10:   31 seconds - Sensitivity 169/289 finished
      From worker 6:    18 seconds - Sensitivity 98/289 finished
      From worker 7:    19 seconds - Sensitivity 118/289 finished
      From worker 15:   27 seconds - Sensitivity 267/289 finished
      From worker 5:    21 seconds - Sensitivity 79/289 finished
      From worker 16:   26 seconds - Sensitivity 286/289 finished
      From worker 6:    18 seconds - Sensitivity 99/289 finished
      From worker 14:   26 seconds - Sensitivity 251/289 finished
      From worker 9:    31 seconds - Sensitivity 154/289 finished
      From worker 11:   31 seconds - Sensitivity 189/289 finished
      From worker 10:   31 seconds - Sensitivity 170/289 finished
      From worker 2:    69 seconds - Sensitivity 5/289 finished
      From worker 5:    21 seconds - Sensitivity 80/289 finished
      From worker 15:   26 seconds - Sensitivity 268/289 finished
      From worker 16:   26 seconds - Sensitivity 287/289 finished
      From worker 3:    72 seconds - Sensitivity 26/289 finished
      From worker 9:    30 seconds - Sensitivity 155/289 finished
      From worker 4:    73 seconds - Sensitivity 46/289 finished
      From worker 11:   30 seconds - Sensitivity 190/289 finished
      From worker 10:   30 seconds - Sensitivity 171/289 finished
      From worker 15:   25 seconds - Sensitivity 269/289 finished
      From worker 16:   26 seconds - Sensitivity 288/289 finished
      From worker 9:    30 seconds - Sensitivity 156/289 finished
      From worker 11:   33 seconds - Sensitivity 191/289 finished
      From worker 15:   25 seconds - Sensitivity 270/289 finished
      From worker 10:   30 seconds - Sensitivity 172/289 finished
      From worker 16:   26 seconds - Sensitivity 289/289 finished
      From worker 2:    66 seconds - Sensitivity 6/289 finished
      From worker 11:   29 seconds - Sensitivity 192/289 finished
      From worker 10:   29 seconds - Sensitivity 173/289 finished
      From worker 3:    70 seconds - Sensitivity 27/289 finished
      From worker 4:    71 seconds - Sensitivity 47/289 finished
      From worker 11:   29 seconds - Sensitivity 193/289 finished
      From worker 10:   29 seconds - Sensitivity 174/289 finished
      From worker 11:   17 seconds - Sensitivity 194/289 finished
      From worker 2:    65 seconds - Sensitivity 7/289 finished
      From worker 10:   29 seconds - Sensitivity 175/289 finished
      From worker 3:    68 seconds - Sensitivity 28/289 finished
      From worker 4:    69 seconds - Sensitivity 48/289 finished
      From worker 2:    64 seconds - Sensitivity 8/289 finished
      From worker 3:    67 seconds - Sensitivity 29/289 finished
      From worker 4:    68 seconds - Sensitivity 49/289 finished
      From worker 4:    19 seconds - Sensitivity 50/289 finished
      From worker 2:    65 seconds - Sensitivity 9/289 finished
      From worker 4:    19 seconds - Sensitivity 51/289 finished
      From worker 4:    20 seconds - Sensitivity 52/289 finished
      From worker 3:    67 seconds - Sensitivity 30/289 finished
      From worker 4:    20 seconds - Sensitivity 53/289 finished
      From worker 2:    65 seconds - Sensitivity 10/289 finished
      From worker 4:    20 seconds - Sensitivity 54/289 finished
      From worker 4:    20 seconds - Sensitivity 55/289 finished
      From worker 3:    68 seconds - Sensitivity 31/289 finished
      From worker 4:    21 seconds - Sensitivity 56/289 finished
      From worker 4:    19 seconds - Sensitivity 57/289 finished
      From worker 2:    65 seconds - Sensitivity 11/289 finished
      From worker 4:    20 seconds - Sensitivity 58/289 finished
      From worker 4:    19 seconds - Sensitivity 59/289 finished
      From worker 3:    68 seconds - Sensitivity 32/289 finished
      From worker 4:    20 seconds - Sensitivity 60/289 finished
      From worker 2:    65 seconds - Sensitivity 12/289 finished
      From worker 3:    66 seconds - Sensitivity 33/289 finished
      From worker 2:    63 seconds - Sensitivity 13/289 finished
      From worker 3:    66 seconds - Sensitivity 34/289 finished
      From worker 2:    63 seconds - Sensitivity 14/289 finished
      From worker 3:    66 seconds - Sensitivity 35/289 finished
      From worker 2:    63 seconds - Sensitivity 15/289 finished
      From worker 3:    66 seconds - Sensitivity 36/289 finished
      From worker 2:    65 seconds - Sensitivity 16/289 finished
      From worker 3:    66 seconds - Sensitivity 37/289 finished
      From worker 2:    65 seconds - Sensitivity 17/289 finished
      From worker 3:    67 seconds - Sensitivity 38/289 finished
      From worker 2:    65 seconds - Sensitivity 18/289 finished
      From worker 3:    68 seconds - Sensitivity 39/289 finished
      From worker 2:    65 seconds - Sensitivity 19/289 finished
      From worker 3:    66 seconds - Sensitivity 40/289 finished
      From worker 2:    65 seconds - Sensitivity 20/289 finished
C:\temp\jl_8YHakp
1502.335118 seconds (9.77 M allocations: 690.829 MiB, 0.01% gc time, 0.03% compilation time)

julia> 

Dagger.jl is a library that builds a scheduler on top of Distributed.jl, and allows similar semantics to Threads.@spawn (in the form of Dagger.@spawn) for launching work. While I can’t guarantee the scheduler will do exactly what you want, it is often better than @distributed for problems which have uneven work distributions.

If you end up using Dagger and run into problems, feel free to ping me here or file an issue.

1 Like