SciLua perhaps?
Good job getting the benchmarks up to date!
SciLua perhaps?
Good job getting the benchmarks up to date!
Thanks, and thanks for the SciLua suggestion! That did the trick. New plot, with just Go and Scala to go:
Lua looks greatly improved since these were last run. Or maybe C has worsenedâŚthatâd be a simpler explanation for so many languages improving in the mandel
benchmark. I will double-check the C compilation flags, etc.
It doesnât look right that Octave and Python are so much slower on rand_mat_mul
, since this test mainly measures BLAS performance. Are they linked to the same BLAS?
Thoughts that come to mind.
I am surprised that Fortran is so much higher than C here. The code should be similar, and the compilers should be able to optimize similarly but probably slightly more with Fortran because of lack of alias. Is that not correct? To me, it looks like compilation flags need to be changed here.
mandel seems very very weird. Iâd be scared of sharing this without an explanation.
MATLAB looks just like I remember it: perfectly fine if youâre doing math on arrays, but donât parse integers or print stringsâŚ
You can see how languages which donât have strictly-typed lists really hurt in the quicksort test.
Maybe the last time Lua wasnât using LuaJiT. What version of Lua is used should be clearly documented.
It can be seen on the old benchmarks here that everything using BLAS has the same performance for rand_mat_mul
. mandel
looks similarly strange in these old benchmarks.
@ChrisRackauckas & @stevengj, thanks. Iâll check all the compilation flags and BLAS linking carefully over the next few days. Some preliminary responses, from out of office.
I used the stock Python, R, and Octave packages on my Linux distribution, openSUSE LEAP 42.2. These must link to the system openBLAS rather than the openBLAS within the Julia source tree, which the C, Fortran, and Julia benchmarks link to.
Mandel does seem weird, almost as if the number of iterations got changed inadvertently for C or something. Fortran, too, as if an optimization flag was dropped. However, I am running the benchmarks straight off the existing Makefile, with just a few tweaks for changes in the locations of libraries in the Julia source tree. Iâll double-check.
For Lua, I used SciLua rather than the gsl-shell used for the benchmarks posted at Julia Micro-Benchmarks. The benchmarks at http://scilua.org/ show SciLua is very competitive with Julia on the Julia benchmarks: it beats Julia by a factor between 1.5 and 2 on fib, parseint, and mandel, loses by not quite that much on quicksort, and is roughly equal on others.
Fib seems to be missing from my plots. Iâll check why.
Also, I think I should make a PR for the few tweaks I did just to get the benchmarks running again, so that everyone can run the open-source languages themselves and help dig into these questions.
Nice work! Please consider sorting the x axis based on the mean for each language, or something similar. Would be easier to see how good Julia is.
When we had the âWhy not numba?â discussion, I made a graph comparing Julia to Numba. Maybe it would be nice to add Numba alongside python? It is pretty easy, because all you have to do is decorate the functions with @autojit
. Itâs just a suggestion though.
Just for the record: on the parseintperf.m the matlab profiler (2017a) reports a split of runtime into 45% sprintf, 35% sscanf, 13% random generator and 5% assert.
Iâve opened a PR for a few revisions that get test/perf/micro running in 0.7.0-DEV, including switching from gsl-shell to SciLua for the Lua benchmarks, as discussed in issue #14222.
@ChrisRackauckas, As far as I can tell, the Fortran compilation, execution, and compilation is all in order. The Make system compiles the Fortran code with -O0, -O1, -O2, -O3 sucessively, links to the openblas lib within the Julia source tree, and executes those each five times. You can watch that with make --debug=v benchmarks/fortran.csv
. Some perl postprocessing then extracts the fastest execution of all during make benchmarks.csv
, and I verified by eye on the datafiles that that works correctly.
On the other hand, I can see an explicit call to cblas_dgemm
in the C randmatmul
function in perf.c
, whereas the Fortran randmatmul
subroutine calls a matmul
function. I barely understand fortran. That function is used in a number of places in perf.f90
but is never defined. What is it, where is it defined, and is it calling BLAS? I donât know.
The fib test was missing from results because of an error in the CSV datatable reading in http://nbviewer.jupyter.org/url/julialang.org/benchmarks.ipynb. readtable
expects the first line of a CSV file to be a header. Iâve updated this notebook for 0-7.0-DEV, too and will file a separate PR. The fib data is even missing from Julia Micro-Benchmarks!
Lua is SciLua-v1.0.0-beta12.
I havenât yet investigated the poor performance of rand_mat_mut
for Python, R, and Octave., or the unexpectedly good performance of mandel
for Lua, Java, and Javascript.
matmul
is the Fortran intrinsic for matrix multiplication. Afaik it doesnât call BLAS, but that could depend on the compiler. Which compiler is the benchmark using?
gfortan, version 7.2 or so.
The fortran bennchmark has called BLAS in the past, judging by peopleâs reactions. I suppose looking at the git history on perf.f90 and understanding whether gfortran calls BLAS is next.
Iâd be surprised if the gfortran matmul
calls BLAS. Plenty of times I have built gcc/gfortran from source, and never noticed that BLAS is a compile-time or run-time dependency. I was thinking more of Intelâs ifort, which they usually bundle with mkl, so I could imagine that ifortâs matmul
calls BLAS (mkl).
I think even showing two compilers would be interesting. I think that âwhen all is said and done, the cost of using Julia is at least less than the efficiency variation between compilersâ is a pretty convincing argument that itâs at least fast enough to stop worrying about the language and start worrying about the code. If I have a chart I could point to for that, I would be happy.
gfortran
can call BLAS for matmul
with large matrices, but it must be specified. The BLAS packaged with Julia seems to have an incompatible API though, so that option is prevented by this section of the perf Makefile:
FFLAGS=-fexternal-blas
#gfortran cannot multiply matrices using 64-bit external BLAS.
ifeq ($(findstring gfortran, $(FC)), gfortran)
ifeq ($(USE_BLAS64), 1)
FFLAGS=
endif
FFLAGS+= -static-libgfortran
endif
Perhaps the lesson is that one can get ideal performance from Fortran, but it takes more thought (arcane compiler flags and knowing where a compatible library lives) or money (for, e.g. Intel Fortran) than with Julia for this case.
Aha. Is it possible to call BLAS dgemm
directly in Fortran, as done in C? Like in perf.c
double *randmatmul(int n) {
double *A = myrand(n*n);
double *B = myrand(n*n);
double *C = (double*)malloc(n*n*sizeof(double));
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
n, n, n, 1.0, A, n, B, n, 0.0, C, n);
free(A);
free(B);
return C;
}
Intel offers its Fortran compiler for free to open-source contributors, so I can try that.
On a related note, the perf/micro Makefile compiles C with gcc. Does anyone know why not clang?
Of course. A sensible Fortran user would simply write
call dgemm('N','N',n,n,n,1.0d0,A,n,B,n,0.0d0,C,n)
and link to a compatible BLAS (i.e. replace $(LIBBLAS)
with -lblas
in the Fortran link command in the Makefile).
Do the competition rules preclude the sensible solution?
If the rules require a specific version of BLAS with possibly unexpected integer sizes, one needs to write INTERFACE
blocks for the foreign library, with conditional definitions for the various architectures (since it doesnât look like the openblas build was kind enough to generate Fortran module and/or header files for us).
Incidentally, if you do go with Intel Fortran, youâre supposed to make sure that the threading controls (e.g. OMP_NUM_THREADS=1
) are effective. Intelâs threading is complicated.
Any idea why this one JavaScript benchmark is so fast?
There seems something is off with C mandel[brot], the baseline, but might not even be that as they beat Julia also (by a lesser margin).
Before we publish, we should make sure ever everything is proper (I noticed a benchmark elsewhere at least that was unfair to Julia, exploiting fixed-width ASCII while Julia was slower with variable UTF-8).
Should we also take some time to re-assess the Julia benchmark code? For reference:
https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.jl
Why not throw @inbounds
on the qsort!
test? It seems like Julia users whoâve been around for a day would know about that, so itâs not like itâs some secret and this is definitely a case where it might matter (I donât know for sure without running it myself). randmatstat
isnât doing anything in place, which would be the first thing weâd all suggest if someone posted that on this forum. These arenât changes which would change whatâs being tested (unless randmatstat
is supposed to test the GC) or be things that a standard user wouldnât do, which I think fits the criteria. Though Iâm not sure in these cases how much it really matters.
Also check if @simd
can matter (with the system image is rebuilt, thatâs probably needed).