The best strategy to parallelize something like this is to come up with a pair of functions:
-
f
: a function that creates a singleton solution given a single element- Imagine you have an array with single element. What is the answer you want?
-
op
: an associative function that merges two solutions- How to merge two outputs of
f
? How to make the output ofop
re-feedable to both of the argument positions?
- How to merge two outputs of
and then feed them to mapreduce(f, op, xs)
.
The associativity of the second function op
is the most important ingredient in the data parallel programming. Unfortunately, the definition of findmax2(prev, curr)
in the OP requires different types for the first and the second arguments. So, it’s not an associative function.
There are multiple ways to implement this. Probably the easiest way to do it is to use tuples as the input of op
. The “merge” operation is then simply concatenation, sorting and truncation:
using Baselet
function merge_max2(a, b)
x, y = Baselet.sort((a..., b...); by = last, rev = true)
return (x, y)
end
mapreduce(tuple, merge_max2, pairs([3, 0, 5, 6, 1])) # (4 => 6, 3 => 5)
This approach is rather nice since it’s possible to generalize to n
max/min elements (for small n
).
(Note: Baselet.sort
is not the only function that implements sorting for “small” container. For example, StaticArrays.jl also implements sorting. See also TupleTools.jl. It might be better to use partialsort
instead if available.)
You can also use @floop
to do this since @floop
is “just” a syntax sugar to (a generalization of) mapreduce
:
function findmax2_floop(xs; ex)
@floop ex for (i, x) in zip(eachindex(xs), xs)
b = tuple(i => x)
@reduce() do (a; b)
x, y = Baselet.sort((a..., b...); by = last, rev = true)
a = (x, y)
end
end
end
On GPU, you might want to set the initial value. Something like init = ((firstindex(xs)-1 => typemin(xs)), (firstindex(xs)-1 => typemin(xs)))
might work.
BTW, this @floop
usage
@floop ex for (i, x) in zip(eachindex(xs), xs)
@reduce() do (xmax1=xtypemin; x), (imax1=-1; i)
if isless(xmax1, x)
imax2, xmax2 = imax1, xmax1
imax1, xmax1 = i, x
end
end
i == imax1 && continue
is not correct; nothing should “leak out” from inside @reduce
. I should document this clearly and ideally throw an error with helpful error message.