Perhaps I wasn’t explicit enough. If profiling emphasizes LAPACK for this code, then there is something wrong with the profiler. I would ask @felix to report on the version of Julia he is using (presumably ProfileView is faithful to the underlying data), and for anyone else to chime in on the status of profiling in master, if that is the relevant base.
On the other hand, there is inefficiency in the triangular sqrtm
, partly due to type instability (at least in Julia 0.5, the unions of complex and real percolate pretty far down). Using small internal functions to handle the inner loops can improve the performance of these, and would be a good project. Maybe @felix can sneak one in if he is preparing a PR. Here is a draft:
function floop(x,R,i,j)
r = x
@inbounds begin
@simd for k = i+1:j-1
r -= R[i,k]*R[k,j]
end
end
r
end
# context omitted
for i = j-1:-1:1
r = floop(A[i,j],R,i,j)
r==0 || (R[i,j] = r / (R[i,i] + R[j,j]))
end
And while I’m at it, of course @stevengj is right that this algorithm is meant for accuracy. Higham and Lin emphasize applications in finance and health care. So I applaud the OP if he is suggesting that an accurate scheme should be the default here. I also see that Higham and Lin improved their algorithm further (citation ); perhaps that is also amenable.