A + x*I is hitting a specialized method, whereas A' + x*I hits a generic method that mutates one element of the diagonal at a time, which is very slow for sparse matrices.
Should be easily fixable by adding some specialized methods, e.g.
It took me a long time to understand why my bifurcation code for some PDE was really slow
This tread shows why. Do you know of a package which solves this?
I think the problem with the current sparse view is that it creates a wrapper where you can access individual matrix elements, but these accesses are very slow for sparse matrices (each one requires a searchsorted), and other operations beyond getindex(A, i, j) are not well-optimized.
As a result, whenever I need a view of a sparse matrix, I build it myself using the fields of the SparseMatrixCSC type. Once you get the hang of it, it’s not overly difficult.
In BifurcationKit, I have in many places @view A .... where A is an AbstractMatrix.
Basically, I do a lot of sigma * I + B where B can be a view to a sparse matrix. Maybe I should code this differently since it allocates anyway
It happened during fold continuation as in here. I need to form the matrix M_f as jacobian for newton iterations. Later, I need to compute the spectrum of dF so I form dF = @view Mf[1:end-1,1:end-1] and compute the spectrum. Unfortunately, dF is not well conditioned, so I use a Shift-inverse strategy which requires to precompute factorize(sigma * I - dF). In a nutshell, all the problems of this post appears.
I know I dont need to form the bordered system to solve Mf, I could use a bordering strategy, it just happens to be slower in my use case.
If I were doing Hopf continuation, it would be the same. Actually, for PD/NS continuation (see here) it would be similar.
Maybe you could look at it the other way, and consider the Mf[1:end-1,1:end-1] matrix to be your starting point, then either create a bigger block matrix or perform a manual solve for the system of the Newton iteration? Idk if there are formulas for solving \begin{pmatrix} M & u \\ v^T & 0 \end{pmatrix} = \begin{pmatrix} a \\ b \end{pmatrix} directly?
Alternately you could keep both M and its principal submatrix in storage, and perform the update manually with a dedicated version in case M is sparse?