For Monte Carlo simulation with same code same algorithm, how fast is Julia compared with Fortran?

Disclaimer: I am not an expert on Monte Carlo simulations, random number generators, etc. Thus, my following comment will not touch these aspects. My main field of expertise are deterministic discretizations of (ordinary and partial) differential equations.

In my experience: Yes. We are working on discretizations of hyperbolic PDEs in Trixi.jl. Many members of our group are also the core developers of FLUXO, an HPC Fortran code for hyperbolic-parabolic PDEs (mainly Navier-Stokes and magnetohydrodynamics). Both Trixi.jl (Julia) and FLUXO (Fortran) implement a common subset of algorithms. We compared their serial performance on this common subset for 3D simulations of compressible flows on curvilinear meshes. In our benchmarks published in a preprint, our Julia code was never slower than the Fortran code - in fact, it was sometimes even more than 2x faster.

However, you need to be careful when writing your Julia code to get this performance. Of course, your starting point should be the performance tips in the docs.

If you have carefully considered the performance tips in the docs, there are still some caveats when comparing Julia to Fortran. First, Julia uses LLVM, which will not generate fused multiply-add (FMA) instructions by default, unless optimizing FORTRAN/C/C++ compilers such as GCC or the Intel compilers, see also my comment in another thread of yours. TL/DR: Use @muladd from MuladdMacro.jl.
Another aspect of different compiler optimizations might be that Fortran compilers are inlining more aggressively - you can nudge the Julia compiler to do the same via @inline.

Finally, I can only recommend to benchmark and profile your code. For example, you can run some simulations and get a nice visualization of profiling data via @profview from ProfileView.jl. If you discover that the culprit is a function that looks basically the same in Julia and Fortran, you can compare their assembly code to see whether there is any substantial difference. I described that process and resulting optimizations & performance improvements in a blog post on optimizing our hyperbolic PDE framework Trixi.jl .

If you have reduced the performance differences to some minimal examples, you can usually get great feedback here on discourse with many tips how to improve the performance of your Julia code (paste minimal working examples in Julia and Fortran).

22 Likes