Hi,
following the recent release of Julia 1.0 I updated a small benchmark tight-binding program that I have implemented in Fortran, C++ with Eigen, C++ with Armadillo, and python/numpy. Roughly, the Fortran and both C++ versions are equivalent both in terms of LOC and performance. The Julia and Numpy versions are roughly the same in terms of LOC, about half the LOC of the Fortran/C++ versions. The Numpy version, however, is very slow, roughly a factor of 50 slower than Fortran (excluding the part which is just a lapack call).
Now, previously in the Julia 0.4 timeframe, the Julia version was about half as fast as the Fortran/C++ versions. That version used the Devectorize package, which seems to have been unmaintained now for several years. I was unable to make it work with julia 0.6.x, not to mention 1.0. However, it seems that as of Julia 0.6 there is the “@.” macro which does roughly the same as the @devec macro from Devectorize(?). With @. for a few critical operations, Julia 1.0 is a factor of 1.7 slower than Fortran. Without @., about a factor of 2.1 slower.
However, if I rewrite those expression as manual loops, Julia is only a factor of 1.1 slower than Fortran, that is, more or less the same! Very impressive!
Although slightly disappointing that I had to resort to writing manual loops for performance. Is there some trick I’m missing? The expressions in question are all of the form
@. v[:] = atoms[bj,:] - atoms[bi,:]
which I rewrite as an explicit loop like:
for z = 1:3
v[z] = atoms[bj,z] - atoms[bi,z]
end
Does Julia create a copy as part of the slicing operation, or what makes the array syntax slow? The @time macro does report a lot of allocations due to this, whether it’s an actual copy, an array descriptor for the slice, or whatever. Allocations for one particular case:
- Unoptimized: 3.83 M
- Using @.: 2.55 M
- Explicit loops: 4
Is there anything that can be done here?