It is also natural that the experienced Python developer that implemented the Python version got a better implementation than the sub-optimal implementations of the other languages. I could not implement the same thing in Python as of today.
@lmiq I didn’t write the C++ and Fortran implementations. These implementations have been written by C++ and Fortran developers and used in the paper mentioned in the README (GitHub - paugier/nbabel). They come from the website https://www.nbabel.org/.
It would be really simple even for me who is not a great C++ dev to improve the C++ version. But this would not be very interesting. Of course, C++ can be very fast.
If a Python implementation using Transonic-Pythran is faster than a C++ implementation, it just means that the C++ implementation is not optimal. Because, behind the scene, a pure C++ implementation (without interaction with CPython) has been written automatically from the Python code by Pythran. It is just much more convenient to write it in Python-Numpy rather than in C++.
Regarding the compiler flags, they are all the same for the different implementations.