Simple PDE benchmark

antoine-levitt · December 1, 2017, 8:16am

Hi,

I made a simple benchmark to try and convince colleagues to switch to Julia. It might be of interest to people, so here it is : https://github.com/antoine-levitt/benchmark_heat. It’s the most simple PDE code I could cook up (1D, explicit euler, finite differences), and it’s designed to highlight differences between languages for simple kernels, and in particular loops vs vectorized code. I compare C (gcc and clang with various flags), julia (loops and broadcast, different annotations/versions), and numpy/scilab/octave/matlab (loops and vector)

Some highlights of my very rudimentary benchmarking (best of 3 on my laptop):

There’s still a penalty for writing vector code in julia (doesn’t do SIMD?)
Julia vectorized is the fastest of any vectorized versions, and in particular has a factor of three speedup between 0.6 and 0.7!
Julia loops beats everybody, even the C version when compiled with clang (which I don’t understand)
Loops in matlab have gotten fast in the latest versions, to the point where they even beat the vectorized version. That must have been huge work, kudos to them!

mauro3 · December 1, 2017, 8:49am

Thanks!

Last time I checked (a few years ago), the restrictions on the loops to allow JITing were pretty severe. E.g. no call to m-file defined functions.

lobingera · December 1, 2017, 9:03am

“fast in the latest versions” - “Last time I checked (a few years ago)” … maybe you should recheck

dlfivefifty · December 1, 2017, 10:01am

It looks like your vectorised version is allocating slices. What if you use views instead?

antoine-levitt · December 1, 2017, 10:12am

I do:

@views @. u_next[2:Nx-1] = u[2:Nx-1] + D*dt/(dx*dx)*(u[3:Nx]-2u[2:Nx-1]+u[1:Nx-2]) - v*dt/dx*(u[3:Nx] - u[2:Nx-1])

Removing the views and the dot each add to the computing time.

cortner · December 1, 2017, 2:25pm

This is the key issue in matlab: cost of function calls

ChrisRackauckas · December 1, 2017, 2:51pm

Yup, these numbers are quite similar to the ones I’ve been seeing after writing tons of these things . You even get the Julia beating C part as well. I’ll have to start sharing this.

ScottPJones · December 1, 2017, 3:19pm

Have you tried using the @simd macro?

antoine-levitt · December 1, 2017, 4:16pm

Not sure how to apply simd to a broadcast, isn’t it supposed to be before a loop?

@ChrisRackauckas how do you explain beating C? I don’t see what julia knows that C doesn’t here. Could it be that clang as a front-end does not produce as good a llvm IR as julia? gcc was about the same as julia in my example.

ChrisRackauckas · December 1, 2017, 4:28pm

Here? No idea. Basically, our benchmarks routinely show we beat a lot of the C/Fortran methods by about 8x-10x. In this post:

I explain how most of that is explained by late binding (compiling the solver with the ODE function in that. A package which provides a solver can’t do this). However, there’s still this curious ~2x I don’t understand. In your example, you essentially do the “late binding” by writing it in there, but you also re-create the unexplained difference (at least when compiling with Clang). I said the numbers lined up with what I’ve seen (vs MATLAB/Python/etc as well), but there are some pieces of it that I don’t quite understand why they exist yet.

PetrKryslUCSD · December 1, 2017, 4:37pm

I am a bit confused: there is only one source file for each language. How did you construct the table? Shouldn’t there be multiple versions (vectorized, loops)?

PetrKryslUCSD · December 1, 2017, 4:46pm

I think the various versions are derived by fiddling with the comments. Am I right?

antoine-levitt · December 1, 2017, 5:23pm

Yes. I’m a very low-tech person

PetrKryslUCSD · December 1, 2017, 5:27pm

If I may make a suggestion: generating multiple versions iin separate files might be preferable if it is to be generally useful to the readers of the discourse. Right now, at least for some of the codes, it is ambiguous how to derive any particular version. What to leave out and what to keep is obviously going to affect the timing (and possibly the correctness of the result as well).

antoine-levitt · December 1, 2017, 5:53pm

I just made this to figure out roughly what the timings were, uploaded this to github to save it in case I want to look at it in a year, and posted it here in case people were interested. Anybody is very welcome to take the code and put it in a benchmarking suite if they wish, but I have no interest in making it systematic.

Topic		Replies	Views
Loop vs vectorization Performance	4	1491	October 10, 2023
Matlab versus Julia General Usage	33	4947	July 15, 2021
Julia can be dramatically slower than Matlab when solving 2D PDE New to Julia performance , matlab , pde	42	1237	November 9, 2024
MATLAB outperforms Julia (20 times faster) running this nested loop Performance question , performance , matlab , time , finitediff	20	5403	January 10, 2023
Arithmetic broadcasting in Julia 5x slower than MATLAB Performance	17	1071	May 26, 2022

Simple PDE benchmark

Related topics