Benchmark for latest julia?

SciLua perhaps?

Good job getting the benchmarks up to date!

2 Likes

Thanks, and thanks for the SciLua suggestion! That did the trick. New plot, with just Go and Scala to go:

Lua looks greatly improved since these were last run. Or maybe C has worsened…that’d be a simpler explanation for so many languages improving in the mandel benchmark. I will double-check the C compilation flags, etc.

3 Likes

It doesn’t look right that Octave and Python are so much slower on rand_mat_mul, since this test mainly measures BLAS performance. Are they linked to the same BLAS?

Thoughts that come to mind.

  1. I am surprised that Fortran is so much higher than C here. The code should be similar, and the compilers should be able to optimize similarly but probably slightly more with Fortran because of lack of alias. Is that not correct? To me, it looks like compilation flags need to be changed here.

  2. mandel seems very very weird. I’d be scared of sharing this without an explanation.

  3. MATLAB looks just like I remember it: perfectly fine if you’re doing math on arrays, but don’t parse integers or print strings…

  4. You can see how languages which don’t have strictly-typed lists really hurt in the quicksort test.

  5. Maybe the last time Lua wasn’t using LuaJiT. What version of Lua is used should be clearly documented.

It can be seen on the old benchmarks here that everything using BLAS has the same performance for rand_mat_mul. mandel looks similarly strange in these old benchmarks.

1 Like

@ChrisRackauckas & @stevengj, thanks. I’ll check all the compilation flags and BLAS linking carefully over the next few days. Some preliminary responses, from out of office.

I used the stock Python, R, and Octave packages on my Linux distribution, openSUSE LEAP 42.2. These must link to the system openBLAS rather than the openBLAS within the Julia source tree, which the C, Fortran, and Julia benchmarks link to.

Mandel does seem weird, almost as if the number of iterations got changed inadvertently for C or something. Fortran, too, as if an optimization flag was dropped. However, I am running the benchmarks straight off the existing Makefile, with just a few tweaks for changes in the locations of libraries in the Julia source tree. I’ll double-check.

For Lua, I used SciLua rather than the gsl-shell used for the benchmarks posted at Julia Micro-Benchmarks. The benchmarks at http://scilua.org/ show SciLua is very competitive with Julia on the Julia benchmarks: it beats Julia by a factor between 1.5 and 2 on fib, parseint, and mandel, loses by not quite that much on quicksort, and is roughly equal on others.

Fib seems to be missing from my plots. I’ll check why.

1 Like

Also, I think I should make a PR for the few tweaks I did just to get the benchmarks running again, so that everyone can run the open-source languages themselves and help dig into these questions.

3 Likes

Nice work! Please consider sorting the x axis based on the mean for each language, or something similar. Would be easier to see how good Julia is.

1 Like

When we had the “Why not numba?” discussion, I made a graph comparing Julia to Numba. Maybe it would be nice to add Numba alongside python? It is pretty easy, because all you have to do is decorate the functions with @autojit. It’s just a suggestion though.

4 Likes

Just for the record: on the parseintperf.m the matlab profiler (2017a) reports a split of runtime into 45% sprintf, 35% sscanf, 13% random generator and 5% assert.

I’ve opened a PR for a few revisions that get test/perf/micro running in 0.7.0-DEV, including switching from gsl-shell to SciLua for the Lua benchmarks, as discussed in issue #14222.

@ChrisRackauckas, As far as I can tell, the Fortran compilation, execution, and compilation is all in order. The Make system compiles the Fortran code with -O0, -O1, -O2, -O3 sucessively, links to the openblas lib within the Julia source tree, and executes those each five times. You can watch that with make --debug=v benchmarks/fortran.csv. Some perl postprocessing then extracts the fastest execution of all during make benchmarks.csv, and I verified by eye on the datafiles that that works correctly.

On the other hand, I can see an explicit call to cblas_dgemm in the C randmatmul function in perf.c, whereas the Fortran randmatmul subroutine calls a matmul function. I barely understand fortran. That function is used in a number of places in perf.f90 but is never defined. What is it, where is it defined, and is it calling BLAS? I don’t know.

The fib test was missing from results because of an error in the CSV datatable reading in http://nbviewer.jupyter.org/url/julialang.org/benchmarks.ipynb. readtable expects the first line of a CSV file to be a header. I’ve updated this notebook for 0-7.0-DEV, too and will file a separate PR. The fib data is even missing from Julia Micro-Benchmarks!

Lua is SciLua-v1.0.0-beta12.

I haven’t yet investigated the poor performance of rand_mat_mut for Python, R, and Octave., or the unexpectedly good performance of mandel for Lua, Java, and Javascript.

2 Likes

matmul is the Fortran intrinsic for matrix multiplication. Afaik it doesn’t call BLAS, but that could depend on the compiler. Which compiler is the benchmark using?

gfortan, version 7.2 or so.

The fortran bennchmark has called BLAS in the past, judging by people’s reactions. I suppose looking at the git history on perf.f90 and understanding whether gfortran calls BLAS is next.

I’d be surprised if the gfortran matmul calls BLAS. Plenty of times I have built gcc/gfortran from source, and never noticed that BLAS is a compile-time or run-time dependency. I was thinking more of Intel’s ifort, which they usually bundle with mkl, so I could imagine that ifort’s matmul calls BLAS (mkl).

2 Likes

I think even showing two compilers would be interesting. I think that “when all is said and done, the cost of using Julia is at least less than the efficiency variation between compilers” is a pretty convincing argument that it’s at least fast enough to stop worrying about the language and start worrying about the code. If I have a chart I could point to for that, I would be happy.

2 Likes

gfortran can call BLAS for matmul with large matrices, but it must be specified. The BLAS packaged with Julia seems to have an incompatible API though, so that option is prevented by this section of the perf Makefile:

FFLAGS=-fexternal-blas
#gfortran cannot multiply matrices using 64-bit external BLAS.
ifeq ($(findstring gfortran, $(FC)), gfortran)
ifeq ($(USE_BLAS64), 1)
FFLAGS=
endif
FFLAGS+= -static-libgfortran
endif

Perhaps the lesson is that one can get ideal performance from Fortran, but it takes more thought (arcane compiler flags and knowing where a compatible library lives) or money (for, e.g. Intel Fortran) than with Julia for this case.

1 Like

Aha. Is it possible to call BLAS dgemm directly in Fortran, as done in C? Like in perf.c

double *randmatmul(int n) {
    double *A = myrand(n*n);
    double *B = myrand(n*n);
    double *C = (double*)malloc(n*n*sizeof(double));
    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
                n, n, n, 1.0, A, n, B, n, 0.0, C, n);
    free(A);
    free(B);
    return C;
}

Intel offers its Fortran compiler for free to open-source contributors, so I can try that.

On a related note, the perf/micro Makefile compiles C with gcc. Does anyone know why not clang?

Of course. A sensible Fortran user would simply write

call dgemm('N','N',n,n,n,1.0d0,A,n,B,n,0.0d0,C,n)

and link to a compatible BLAS (i.e. replace $(LIBBLAS) with -lblas in the Fortran link command in the Makefile).

Do the competition rules preclude the sensible solution?

If the rules require a specific version of BLAS with possibly unexpected integer sizes, one needs to write INTERFACE blocks for the foreign library, with conditional definitions for the various architectures (since it doesn’t look like the openblas build was kind enough to generate Fortran module and/or header files for us).

Incidentally, if you do go with Intel Fortran, you’re supposed to make sure that the threading controls (e.g. OMP_NUM_THREADS=1) are effective. Intel’s threading is complicated.

1 Like

Any idea why this one JavaScript benchmark is so fast?

There seems something is off with C mandel[brot], the baseline, but might not even be that as they beat Julia also (by a lesser margin).

Before we publish, we should make sure ever everything is proper (I noticed a benchmark elsewhere at least that was unfair to Julia, exploiting fixed-width ASCII while Julia was slower with variable UTF-8).

Should we also take some time to re-assess the Julia benchmark code? For reference:

https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.jl

Why not throw @inbounds on the qsort! test? It seems like Julia users who’ve been around for a day would know about that, so it’s not like it’s some secret and this is definitely a case where it might matter (I don’t know for sure without running it myself). randmatstat isn’t doing anything in place, which would be the first thing we’d all suggest if someone posted that on this forum. These aren’t changes which would change what’s being tested (unless randmatstat is supposed to test the GC) or be things that a standard user wouldn’t do, which I think fits the criteria. Though I’m not sure in these cases how much it really matters.

Also check if @simd can matter (with the system image is rebuilt, that’s probably needed).