Different performance between reduce(map()) and mapreduce()

gwlagerweij · August 2, 2022, 8:37am

Hi all,

I want to create an array of indices (for creation of a sparse matrix in a later stage) based on a list of element connectivity as follows:

N     = 10000;                               # Number of elements
elems = collect( [i, i+1] for i in 1:N);     # List of element connectivity

I = reduce(vcat, map(e -> e[[1, 2, 1, 2]], elems));
J = reduce(vcat, map(e -> e[[1, 1, 2, 2]], elems));

For this simplified example (in reality elems may not have this simple structure), the output should be

elems = [[1, 2], [2, 3], [3, 4], ...]
I = [1, 2, 1, 2, 2, 3, 2, 3, 3, 4, 3, 4, ...]
J = [1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, ...]

Performance of mapreduce()

The performance of the variant shown above is unexpectedly much better than that using mapreduce(). Consider the same N and elems as above, then the performance I get for the two variants is

@btime begin
    I = reduce(vcat, map(e -> e[[1, 2, 1, 2]], elems));
    J = reduce(vcat, map(e -> e[[1, 1, 2, 2]], elems));
end

>> 1.789 ms (40014 allocations: 4.58 MiB)

Whereas with mapreduce, the time and allocated memory are much larger.

@btime begin
    I = mapreduce(e -> e[[1, 2, 1, 2]], vcat, elems);
    J = mapreduce(e -> e[[1, 1, 2, 2]], vcat, elems);
end

>> 25.126 ms (63644 allocations: 199.55 MiB)

The mapreduce() documentation suggests that it should be faster than reduce(map()) because of the lack of intermediate allocation of the map. Does anyone know why this is not the case?

Further improving performance

I am fairly new to Julia, and don’t know if it would be possible to further improve the performance of this piece of code (that is, reduce its execution time and allocated memory). It is a very performance-critical part of a bigger project, so I would be grateful if you could give me some pointers.

Many thanks in advance, Gijs.

nilshg · August 2, 2022, 9:11am

Don’t know anything about mapreduce (maybe because I never use map, see below), but I can get a factor of two from performing the getindex call with getindex rather than the anonymous function (which maybe allocates an extra vector on every call?):

julia> @btime begin
           I = reduce(vcat, map(e -> e[[1, 2, 1, 2]], elems))
           J = reduce(vcat, map(e -> e[[1, 1, 2, 2]], elems))
       end;
  1.061 ms (40014 allocations: 4.58 MiB)

julia> @btime begin
           I = reduce(vcat, getindex.($elems, Ref([1, 2, 1, 2])))
           J = reduce(vcat, getindex.($elems, Ref([1, 1, 2, 2])))
       end;
  627.100 μs (20016 allocations: 2.75 MiB)

aplavin · August 2, 2022, 9:25am

This should get rid of almost all allocations and get a speedup of a factor of a few:

@btime let
       ixs = [1, 2, 1, 2]
       @views reduce(vcat, map(e -> e[ix], $elems))
end

nilshg · August 2, 2022, 9:30am

Ah yes a view makes much more sense here, for a direct comparison:

julia> @btime begin
           I = reduce(vcat, view.($elems, Ref([1, 2, 1, 2])))
           J = reduce(vcat, view.($elems, Ref([1, 1, 2, 2])))
       end;
  263.000 μs (16 allocations: 1.37 MiB)

so about another factor of 2 with views.

skleinbo · August 2, 2022, 9:40am

reduce has a specialization for vcat where it preallocates an output array first and then loops over to fill it. mapreduce isn’t so clever here.

See Optimized methods for `mapreduce(f, vcat, xs)` and `mapreduce(f, hcat, xs)` · Issue #31137 · JuliaLang/julia · GitHub

It is a bit unfortunate that specializations for vcat/hcat are not mentioned in the documentation AFAICT.

gwlagerweij · August 2, 2022, 9:53am

That makes sense, I will stick with reduce then. Since this is the answer to the question, I’ll mark this as solution.

Many thanks to @nilshg and @aplavin for their input. I see there is still much to learn about Julia. The performance of the complete matrix assembly function has increased by a factor of roughly three, and there should be more opportunities to apply these newfound optimizations.
If anyone has other suggestions, I will be happy to accept these

DNF · August 2, 2022, 10:18am

It’s also an issue that your code is using global variables, and generally working in global scope. To get reliable benchmarks, you should put your code in a function and avoid globals.

DNF · August 2, 2022, 10:55am

BTW, there is special syntax for this, called an ‘array comprehension’:

[[i, i+1] for i in 1:N]

Topic		Replies	Views
Mapreduce performance and dispatch Performance question	0	209	July 7, 2023
Mapreduce over selected indices Performance	3	617	May 13, 2018
Underperformant mapreduce Performance	0	223	July 1, 2022
Mapreduce slower than sum General Usage question	2	456	March 17, 2020
Sum, mapreduce and broadcasted Performance broadcast	14	2890	September 23, 2018

Different performance between reduce(map()) and mapreduce()

Performance of mapreduce()

Further improving performance

Related topics