Hello,

to test the accuracy of the Schur-Pade approximation of `A^r`

for `A`

a matrix and `r`

a real number, I implemented a method `rootm`

to calculate `A^(1/q), q::Int`

. This is a direct generalisation of `sqrtm = A^(1/2)`

(PR20214) and would allow the accurate computation of `A^(p//q)`

for any integers `p,q`

. The algorithm solves `X^p = A`

for `X`

by getting a recurrence relation that directly derives from writing out the product `X^p`

. **Anyway, the performance of this code sucks.** Why?

```
A = randn(127,127)
A = UpperTriangular(schurfact(A'*A)[:T])
@benchmark _sqrtm(A)
median time: 744.535 Ī¼s (0.00% GC)
@benchmark _rootm(A,2,Val{true})
median time: 145.209 ms (1.93% GC)
@profile _rootm(A,2,Val{true})
ProfileView.view()
```

Using `@code_warntype`

, I get no red ink. The algorithm is not *that* much more complex than `sqrtm`

, but it is 200x slower. I donāt understand the profiling ā where does the inference come from, and why are there several distinct calls to `_rootm`

in the above profile despite the function being called once?

Performance for `p`

th roots with `p>2`

is obviously the more interesting point, but `p=2`

is a good benchmark and Iād like to understand the performance difference.

Any comments and advice appreciated