Hello,

tldrcan anyone compare the below algorithm to its Matlab equivalent, or suggest improvements to the`\`

bottleneck?

I tried to re-implement the Higham-Lin algorithm for fractional matrix powers, code here.

The basic idea in the computation is to use square roots until the matrix norm approaches 1, take the “Pade approximants”, then square power back up again. Calculating these approximants seems to be a bottleneck and overall performance of this sophisticated algorithm is about x10-x100 slower than the brute-force method in base, which just finds the eigenvalues & vectors, applies the matrix power to the eigenvalues, and recombines everything.

```
A = rand(100,100)
A = A'*A
m = 7/39
@benchmark A^m
memory estimate: 512.56 kb
allocs estimate: 40
--------------
minimum time: 2.364 ms (0.00% GC)
median time: 2.430 ms (0.00% GC)
mean time: 2.687 ms (0.84% GC)
maximum time: 50.587 ms (0.00% GC)
@benchmark SchurPade.powerm(A,m)
memory estimate: 58.03 mb
allocs estimate: 3365420
--------------
minimum time: 120.568 ms (4.00% GC)
median time: 122.703 ms (5.25% GC)
mean time: 123.034 ms (4.86% GC)
maximum time: 129.946 ms (2.87% GC)
```

unfortunately. Using `ProfileView`

, I see

So clearly most of the time is spent in the `_powerm_cf`

function, calculating the Pade approximation with `((I+S)\A)`

. I am not an expert on this, does anyone have any advice on analysing this further ? If someone has access to Matlab and can compare times to the original, that would be instructive as well.