Intel C/C++ compiler performance versus Julia

Elrod · May 28, 2021, 7:18am

Multithreaded Julia matrix multiply (specifically, Octavian.jl) “dominates” MKL <100x100, and hangs on in performance

Simply putting LoopVectorization.@tturbo on three for loops manages to do well until we run out of L2 cache. Results will differ by CPU of course. That particular CPU has much more L2 cache than most (18 cores, 1 MiB/core).

MKL starts to run away from the competition around 3000x3000 on that computer.
On AMD systems, the small size performance of Julia-matrix multiply is unmatched:

This was on a Ryzen 4900HS.

The DifferentialEquations ecosystem doesn’t really do any hardware-specific optimizations itself yet, but it (and ModelingToolkit in particular) is another great example of how code generation can be leveraged for better problem solving in Julia.
A long-term goal of mine is to work on an SLP vectorizer it can use, as well as a SPMD compiler like ISPC. DifferentialEquations would be the target for both of these, but they should be usable by interested Julia projects more generally.

But for now, I still have a lot of loop work ahead of me (in particular, modeling much more complicated loops so they can be optimized and still produce correct results).

Topic		Replies	Views
Any benchmark of Julia v1.0 vs older versions Performance	66	8488	April 3, 2019
Julia vs Fortran complaint General Usage fortran	25	14847	July 20, 2017
OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) Performance blas , lapack	40	37217	June 19, 2020
Julia gets mentioned in an article about FORTRAN Community	116	11815	May 27, 2021
Show off Julia performance on your PC! Performance	53	4617	April 26, 2020

Intel C/C++ compiler performance versus Julia

Related topics