Someone can probably do better than me on this, but I suppose the answer is that this is “self time.” I.e., time executing code in the body of a function that isn’t easily attributed to another function. For example, time spent on loop overhead. You could ask the same question about why nothing is shown inside getindex
.
It’s not a fantastic answer. Hopefully someone else can elaborate.
A remark: I don’t think your profiling here is as good as it could be. Note that you call sort_signature!
a bunch of times, but after the first time the input is already sorted and there isn’t much work to do. Notice, for example, that there are no calls to setindex!
caught in your profile.
It still has to run your double-loop and make some checks, but the a > b
check is always false on the sorted input (so it never needs to swap entries) and the a == b
check may never trigger (or will always trigger on the X-th comparison). This also means that branch prediction will do a fantastic job here, giving somewhat optimistic results.
You’d do better to make b = similar(a)
and call sort_signature!(copy!(b,a))
so that it has some work to do every time. The branch predictor may still be unusually good in this case, however, due to the short input size. If you generate a new random vector each time, or just use a longer input, the branch predictor will get knocked down to its average performance.
All this said, on a function this small I wouldn’t expect profiling to be very instructive. You’d likely to better to benchmark this function and try slight variations on things to make it faster. In general, unpredictable branches (e.g., a > b
on unsorted values) are difficult on performance. It may faster to always write-back the values (for example, using min
and max
to choose which goes where, and using some other way to choose whether/how to update signature
).
I also notice that signature
is mostly a parity bit. You could start it at signature = 0
, increment it every time you do a swap, and return iseven(signature) ? 1 : -1
(except at your return 0
exit, obviously). I don’t know whether this will be any faster, though.