I believe in theory a high level language compiler could generate code for an algorithm that is faster than the C/C++ compiler if it has more information about the data objects and usage than the C/C++ compiler is allowed to assume. I don’t know if Julia has that information or not. I suspect in certain cases the answer is yes, but probably more often the answer is no.
This question also get’s into the “fast enough” question. As programmers (in general) we are not going for the fastest execution of code, we are going for fast enough. If we wanted the absolute fasted we’d probably be writing in assembler or compiling down to assembler then hand tuning that code.
As an example of that, I was looking at the BLAKE3 reference implementation in Rust. They have a native Rust implementation, then a faster implementation that uses SIMD instructions in a C library, then the fastest implementation that uses SIMD instructions in an assembler file. They obviously felt that the C compiler was NOT fast enough.
So the “correct” question for the general programming population is does the language let me write code fast, is that code easy to maintain, and does it run it fast enough.
Probably not the answers you were looking for, but don’t think Julia will every run “real” code faster than C++ code and the Intel compiler.