Confusing behaviour
I’ll jump directly into the behaviour I’m observing, please look below for context (under other header). Direct questions can be found in bold.
I have a function that computes the determinant as a part of it
function diag_penalty(m)
if isnzero(abs(det(m)))
return 10
end
return 0
end
when I run my program using this function, the execution takes around 1 minute and 18 seconds (±2s) as measured by a progress bar from ProgressMeter
(this function is called 2805 times during that execution).
Taking a look at htop
during the execution I notice that the program uses all CPU threads. I take a look at Threads.nthreads()
which shows 1
so: How is det()
using all available threads and how can I disable it?
I verify that it is det()
causing this behaviour by switching the relevant line so that the function is instead
function diag_penalty(m)
if any(isnzero.(eigvals(m)))
return 10
end
return 0
end
My program then finishes in 20 seconds (±1s) measured in the same way as before, and looking at htop
the execution happens on one thread.
It seems odd that the eigenvalue calculation is faster than the determinant calculation, especially since the determinant appears parallelized. How can a calculation using det()
be slower than eigvals()
here?
My suspicion that this is indeed strange is verified by in a Jupyter notebook comparing the two:
using LinearAlgebra
mats = [ rand(22,22) for _ in 1:200000 ]
@time for m in mats
det(m)
end
@time for m in mats
eigvals(m)
end
9.433334 seconds (1.20 M allocations: 836.174 MiB, 1.39% gc time)
17.810404 seconds (3.20 M allocations: 7.546 GiB, 1.40% gc time)
So the determinant is indeed faster here, but not in my program. I verify with htop
that the det()
part is computed on all available threads. What can I be doing wrong in my program that makes a computation that was basically twice as fast become much more than four times as slow?
Context
I’m not completely sure what parts of my whole program is relevant, and I have not yet worked out a minimal example. But I will sketch the structure here for context. Questions and suggestions are welcomed to try and pinpoint the issue.
I’m designing a fitness function to be used in with BlackBoxOptim. The fitness function is divided into what I have been calling “penalties”, and the above diag_penalty()
is one of many. The structure is of the fitness function is basically:
function calc_fit(x, parameters; kwargs)
m = prep(x) # prepare input
weights, ... = parameters # weights and other parameters handed over to calc_fit()
penalties = [] # compute and store all penalties
push!(penalties, diag_penalty(m))
push!(penalties, diff_penalty(m))
...
return penalties # fitness generally weighted sum of penalties,
# but one of the kwargs used by aforementioned
# program returns the penalties instead
end
The program mentioned above does not use BlackBoxOptim, but just reads in a bunch of inputs to the fitness function from file and evaluates the fitness function on all of them (17745 different inputs). The inputs are the same every time, so diag_penalty()
is called an equal amount of times in both tests. For the purpose of this post, I also evaluated the fitness function on all inputs before executing the rest of the program to make sure that it is not an issue of Julia having to compile det()
that takes out all that time somehow.
The matrices are such that the if
in the diag_penalty()
are triggered equally at each test, hence whatever the program does afterwards is independent of which version of diag_penalty()
I’m using.
Edit: fixed the number of inputs the test is run on.