Review: schur-pade matrix powers speedup?

I’ve been looking at the improved Schur-Pade algorithm for a while. I have a couple of detailed questions arising from the paper that I don’t feel confident in answering on my own.

A key step is analysis of how many expensive square roots have to be taken for the Pade approximant to be good enough – better than machine epsilon. It involves a hard-coded table of constants θ. The key comparison is between various θ and a certain matrix norm, which roughly halves with each square root. I have excerpted the paper here, I hope that is ok with the authors.

  • “we will allow at most two extra square roots” → is this an ad-hoc judgement call, or is there an obvious reason for this design? With the rule of thumb that square roots halve the norm, one square root should suffice, but in any case taking repeated square roots should reduce the norm enough without it being necessary to count how often it has been done.

  • Is the above selection of the Pade degree m logically equivalent to the following?

    1. if α_3 or α_4 < θ_6, let m be the smallest s.t. α_3 < θ_m;
    2. if 1/2*α_3 < θ_5, take another square root and start again;
    3. if α_3 or α_4 < θ_7, let m=7.

I realise these are very detailed questions. As this is outside of my expertise, I don’t feel comfortable going ahead without review. (I can implement their algorithm, but it’s no fun if I don’t believe it …) Any comment appreciated!