Benchmarks can be quite tricky. It’s my general impression that Reduce is rather fast for computer algebra timings, at least in the particular Lisp tuned to support it (Codemist Standard Lisp).
Is it plausible that some benchmark will run 10X faster on the same machine? Well, if it could use 10 processors instead of one, maybe. Many important CAS algorithms don’t seem to be very prone to parallelism, though. So it would be tough.
How then could 10x speedup be achieved? Sometimes by hacking the program, deliberately or accidentally, to run the benchmark. For example, if you notice that all the integers in the test are exactly representable in one word, use fixnum arithmetic instead of arbitrary-precision. Makes polynomial arithmetic much faster. And maybe one program has bignum exponents and can represent x^(2^65) -1, but the other cannot. Old versions of Maple did not allow the construction of sums of more than 2^N terms with N being smallish, like 16.
Sometimes there is a difference because of rather different algorithms. For instance, multiplication of polynomials in a finite field using discrete Fast Fourier Transforms vs. a naive method. But this is not inherent in the choice of language – the FFT could be written in Lisp or Julia or called in some external hot-shot FFT library. So such a benchmark is more of a marketing data point “Here’s today’s best system” than an inherent “X is better than Y because only X can ever do …”
To some extent, CAS may all run just about the same speed for “asymptotically large” problems if they take advantage of the FOSS code from people hacking to be faster and faster. Libraries like NTL, FLINT, GMP… , various FFT libraries, can be accessed from almost language with some effort.
There are some arguments that the only timings worth comparing are for long-running tasks. After all, the short ones just finish right away. [The reality may be that efficiency matters on small problems because you do zillions of them…]
Regarding a note that maybe CAS authors would say what they would do differently – Many of the authors of the circa 1960-1980 systems are still alive.
Ask them.
I would not expect Stephen Wolfram to say what he would do differently, though some of the co-authors might be more forthcoming. (The Mathematica ethos appears to be marketing drive and defensive. The tendency seems to be to declare that design flaws are actually features.)
RJF