It’s impressive that it’s possible to beat optimized C libraries in certain cases with native Julia code, but focusing too much on such cases will often lead people astray. Benchmarks against optimized C code help establish Julia’s performance capabilities, but only represent a starting point for interesting work.
Fundamentally, the reason Julia exists is not to beat the performance of existing C libraries on existing problems. If all the high-performance code you will ever need has already been written, then you don’t have as much need for a new language. Julia is attractive for people who need to write new high-performance code to solve new problems, which don’t fit neatly into the boxes provided by existing numpy library functions.
It’s not that Julia has some secret sauce that allows it to beat C — it is just that its compiled performance is comparable to that of C (boiling down to the same LLVM backend), so depending on the programmer’s cleverness and time it will sometimes beat C libraries and sometimes not.
It is, however, vastly easier to write high-performance Julia code than comparable C code in most cases, because Julia is a higher level language with performant high-level abstractions. This can also make it easier to be clever, with tricks like metaprogramming that are tedious in C. And the code you write in Julia can at the same time be more flexible (type-generic and composable), allowing greater code re-use.
This is important for those of us who want to work on new problems and to go beyond endlessly recycling the solutions of the past.