Benchmarking a simple PDE algorithm in Julia, Python, Matlab, C++, and Fortran

That is a good question, but it’ll take someone with a better understanding of compiler internals and assembly language than me to answer it.

The thing to do would be to compare the x86-64 assembly language produced by LLVM from the Julia code to the assembly produced by gcc on the Fortran code. But that’s a lot of assemby language.
Running @code_llvm ksintegrateUnrolled(u,Lx, dt, Nt) produces about a thousand lines of LLVM internal representation code, and @code_native produces several times that of assembly.

It’s possible to dig through the LLVM IR or assembly and focus on code for the time-stepping loop, but it’s still beyond my understanding.

But still, the ksintegratorUnrolled function in Julia-1.2.0 is only 15% slower than the equivalent Fortran code.

4 Likes