I’m trying to improve performance of LombScargle.jl
package. Functions used in two methods involves for-loops that can be computed in parallel, thus I thought it would have been a nice idea to use Threads.@threads
, however I experience a large performance degradation when I do so.
For example, without multi-threading (current master
):
julia> using LombScargle, BenchmarkTools
julia> t = collect(linspace(0.01, 10, 100)); s = sin.(t);
julia> @benchmark lombscargle(t, s, fast = false, fit_mean = false)
BenchmarkTools.Trial:
memory estimate: 13.36 kb
allocs estimate: 18
--------------
minimum time: 4.187 ms (0.00% GC)
median time: 4.192 ms (0.00% GC)
mean time: 4.205 ms (0.00% GC)
maximum time: 5.395 ms (0.00% GC)
--------------
samples: 1189
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
Add Threads.@threads
before for
in line
and repeat the same benchmark with just one thread:
julia> @benchmark lombscargle(t, s, fast = false, fit_mean = false)
BenchmarkTools.Trial:
memory estimate: 15.67 mb
allocs estimate: 1026273
--------------
minimum time: 21.778 ms (0.00% GC)
median time: 24.138 ms (7.90% GC)
mean time: 24.079 ms (5.88% GC)
maximum time: 32.270 ms (7.19% GC)
--------------
samples: 208
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
This is a pretty large slowdown. What’s most interesting is that if I move the code inside the loop to a function on its own and enable multi-threading, there is improvement when comparing to the latter case (this is done in current multi-threading
branch):
julia> @benchmark lombscargle(t, s, fast = false, fit_mean = false)
BenchmarkTools.Trial:
memory estimate: 91.66 kb
allocs estimate: 5023
--------------
minimum time: 4.504 ms (0.00% GC)
median time: 4.555 ms (0.00% GC)
mean time: 4.583 ms (0.17% GC)
maximum time: 6.786 ms (31.55% GC)
--------------
samples: 1091
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
This has the same performance of non-threaded code, using more threads speed-up calculations.
Why moving body of the for-loop to a function changes the result of benchmark?