FastChebInterp’s Clenshaw implementation is within a factor of 2 of evalpoly(x, coefs) (Horner) on my machine (where coefs is an array, not a tuple) for a degree of 20. And that’s without any particular effort to use SIMD.
(This is for a runtime array of coefficients. Static tuples of coefficients, where the whole polynomial evaluation is unrolled and inlined as in special-function implementations, are a separate ballgame.)
And of course the point is that if you try monomials/Horner up to degree 20 or higher it can quickly become a numerical disaster for many functions, so the performance is irrelevant. If you have a degree \lesssim 10, in contrast, then it could definitely make sense to convert back to monomial form to try to squeeze out a few more cycles.