I dreamed up a simple-looking task that I thought multithreading could accelerate. Running the file below (I named it
threadbench.jl) proved me wrong: with 72 cores and 72 threads, the timings are not appreciably better than with a single thread. Understanding why I don’t get 72X speedup may help me in my long journey to improving some code I actually care about. Any ideas?
using Base.Threads # This function does the same simple job either with or without threading. function tt(invec; threadit=false) outvec = zeros(size(invec)) if threadit @threads for i in 1:size(invec,1) outvec[i] = threadid()+invec[i] end else for i in 1:size(invec,1) outvec[i] = threadid()+invec[i] end end return outvec end # Run the function once to make sure it's compiled z = rand(1024) y = tt(z) y = tt(z,threadit=true) # Exercise all branches to be extra sure x = round.(rand(2^27),digits=2) println("Timing with one thread:") @time y1 = tt(x,threadit=false) println("Timing with $(Threads.nthreads()) threads:") @time y72 = tt(x,threadit=true)
On my machine, the results are as follows:
Timing with one thread: 1.443072 seconds (2 allocations: 1.000 GiB, 3.54% gc time) Timing with 72 threads: 1.263002 seconds (367 allocations: 1.000 GiB)