The Julia benchmark does call the “generic” one:
* … it’s just that the generic routine dispatches to the fast one. Whether
* call down to a fast BLAS is a library issue. With a decent Fortran system,
matmul calls a fast BLAS, and any “official” posted benchmark numbers should reflect such a configuration.
Frankly, I find this particular benchmark pretty uninteresting — as far as I can tell, its only purpose is to highlight the fact that matrix multiplication via the standard library (if one exists) is basically the same speed in all languages, properly configured, because every standard library can be configured to call the same fast BLAS.