I’ll jump directly into the behaviour I’m observing, please look below for context (under other header). Direct questions can be found in bold.
I have a function that computes the determinant as a part of it
function diag_penalty(m) if isnzero(abs(det(m))) return 10 end return 0 end
when I run my program using this function, the execution takes around 1 minute and 18 seconds (±2s) as measured by a progress bar from
ProgressMeter (this function is called 2805 times during that execution).
Taking a look at
htop during the execution I notice that the program uses all CPU threads. I take a look at
Threads.nthreads() which shows
1 so: How is
det() using all available threads and how can I disable it?
I verify that it is
det() causing this behaviour by switching the relevant line so that the function is instead
function diag_penalty(m) if any(isnzero.(eigvals(m))) return 10 end return 0 end
My program then finishes in 20 seconds (±1s) measured in the same way as before, and looking at
htop the execution happens on one thread.
It seems odd that the eigenvalue calculation is faster than the determinant calculation, especially since the determinant appears parallelized. How can a calculation using
det() be slower than
My suspicion that this is indeed strange is verified by in a Jupyter notebook comparing the two:
using LinearAlgebra mats = [ rand(22,22) for _ in 1:200000 ] @time for m in mats det(m) end @time for m in mats eigvals(m) end 9.433334 seconds (1.20 M allocations: 836.174 MiB, 1.39% gc time) 17.810404 seconds (3.20 M allocations: 7.546 GiB, 1.40% gc time)
So the determinant is indeed faster here, but not in my program. I verify with
htop that the
det() part is computed on all available threads. What can I be doing wrong in my program that makes a computation that was basically twice as fast become much more than four times as slow?
I’m not completely sure what parts of my whole program is relevant, and I have not yet worked out a minimal example. But I will sketch the structure here for context. Questions and suggestions are welcomed to try and pinpoint the issue.
I’m designing a fitness function to be used in with BlackBoxOptim. The fitness function is divided into what I have been calling “penalties”, and the above
diag_penalty() is one of many. The structure is of the fitness function is basically:
function calc_fit(x, parameters; kwargs) m = prep(x) # prepare input weights, ... = parameters # weights and other parameters handed over to calc_fit() penalties =  # compute and store all penalties push!(penalties, diag_penalty(m)) push!(penalties, diff_penalty(m)) ... return penalties # fitness generally weighted sum of penalties, # but one of the kwargs used by aforementioned # program returns the penalties instead end
The program mentioned above does not use BlackBoxOptim, but just reads in a bunch of inputs to the fitness function from file and evaluates the fitness function on all of them (17745 different inputs). The inputs are the same every time, so
diag_penalty() is called an equal amount of times in both tests. For the purpose of this post, I also evaluated the fitness function on all inputs before executing the rest of the program to make sure that it is not an issue of Julia having to compile
det() that takes out all that time somehow.
The matrices are such that the
if in the
diag_penalty() are triggered equally at each test, hence whatever the program does afterwards is independent of which version of
diag_penalty() I’m using.
Edit: fixed the number of inputs the test is run on.