Sum performance for Array{Float64,2} elements

LaurentPlagne · May 15, 2018, 7:02am

Relieved and ashamed : I made a mistake in the nflops function (BigInt(n) must be replaced by BigInt(n)*N).

As a consequence all the Julia performance figures must my multiplied by 4 and there is no more discrepancy with C++ results. Finally my 16 GFlops expectation was correct.

allsum

Nevertheless this mistake was a lucky one for me since I learned a lot from your advises:

Simple is beautiful is efficient : the most elegant loop in Julia is the top performer (sum_rbs/ei)
BLAS1 (vector) loop in Julia competes with best C/C++ implementations and the automatic vectorization is very good (note that g++ 6.4 failed to vectorize properly this loop and the performance is divided by 4). This kind of quantitative comparison will be very useful for me to introduce Julia to my colleagues with a HPC background.
I learn to use SIMD.jl which I guess can be very neat and useful to vectorize more complex computations. (I used the M=16 version of Gunnar’s function for the plot).
I learn to call C/C++ function from Julia and check that it was really easy.
For information the Eigen called function corresponds to the following implementation

extern "C" {
  double eigensum(double * ptr,int N){
    typedef Eigen::Array<double,Eigen::Dynamic,1> EA;
    Eigen::Map<EA> em(ptr,N);
    return em.sum();
  }
}

and the ccall looks like:

function msum_eigen(a::TBSM{Float64, N}) where N
    ccall((:eigensum, "./libesum"),Cdouble,(Ptr{Cdouble}, Csize_t),a.data,length(a.data))
end

Rq1 on Julia: considering the Eigen wrap where I could have use
Eigen::Map<EA,Aligned64> em(ptr,N);
and the SIMD.jl vloada variant of vload, I think that it could be useful to enforce the Julia allocator for arrays to fulfill memory alignments compatible with used SIMD instruction sets.
Rq2 on discourse: it is a pity that I can’t upload svg or pdf images

Thank you again for your help !
Laurent

Topic		Replies	Views
How to match performance of sum(A, dims=1)? Performance	14	1089	September 21, 2022
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4794	July 19, 2021
Why I love Julia Community announcement	34	4767	October 11, 2019
A fast sum. Any downsides? Performance sum	18	911	December 16, 2024
Does the julia intrinsic sum() apply fastmath by default? New to Julia fast-math , sum	15	752	February 21, 2023

Sum performance for Array{Float64,2} elements

Related topics