Is there something like broadcast_mapreduce?

fabiangans · September 26, 2017, 8:16am

Assume I have 2 arrays x and y that can be broadcast to a common size. Is there any way to get a (zipped) iterator over the elements of both arrays as they would result from a broadcast operation? For example:

x = rand(2,5,3)
y = rand(1,5,1)
sum(x.*y)

Is there any way to obtain the sum without allocating the intermediate array by getting an iterator over the elements and calling mapreduce on it?

Tamas_Papp · September 26, 2017, 9:00am

Not necessarily, the MWE runs fine since the length 1 axes are just repeated.

fabiangans · September 26, 2017, 9:14am

I should have made this clearer. As @Tamas_Papp mentioned, I want this to work on arrays of different sizes with singleton dimensions repeated, exactly the way broadcast is defined…

Tamas_Papp · September 26, 2017, 9:24am

Try

x = rand(2,5,3)
y = rand(1,5,1)
sum(x.*y)

yourself. What @fabiangans needs is expansion of singleton dimensions, then broadcast proper, then the sum. I do not see a way to do it with the current setup, other than reusing the expansion framework some clever way with the generator approach you suggested above.

fabiangans · September 26, 2017, 2:10pm

I think this is up to you, but would probably make sense. I had a look at the broadcast source code and if no one else steps up with a solution I will try to tap into that system and adjust it to my needs.

Thanks anyway,

fabiangans · September 27, 2017, 8:39am

Thanks for the hint, I tried to adapt the expansion framework. I tried two approaches:

An iterator similar to zip, but based on the broadcast expansion to repeat singleton dimensions
A broadcast_reduce function, similar to mapreduce

Here are the results of some simple benchmarks:

using BenchmarkTools

x1 = rand(200,5000,30)
x2 = rand(1,5000,1)
x3 = rand(1,1,1)

# The allocating version
@btime sum(x1.*x2.*x3)
# 60.590 ms (57 allocations: 228.88 MiB)

# Iterator-based mapreduce
it = BroadIter(x1,x2,x3)
@btime mapreduce(x->x[1]*x[2]*x[3],+,it)
# 117.096 ms (3 allocations: 64 bytes)

# Iterator-based loop
function sum_iter(x1,x2,x3)
    it = BroadIter(x1,x2,x3)
    s = 0.0
    for (a1,a2,a3) in it
        s+=a1*a2*a3
    end
    s
end
@btime sum_iter(x1,x2,x3)
# 219.865 ms (4 allocations: 240 bytes)

# broadcast_reduce
@btime broadcast_reduce(*,+,0.0,x1,x2,x3)
# 134.958 ms (79 allocations: 1.92 KiB)

So the code is allocation-free and working correctly fro my use case, but still slower than the allocating version. I would guess that this is due to simd-optimizsations? I don’t think they are possible for the iterator approach. Any hints on how to improve one of the approaches would be welcome. The code that I adapted from broadcast.jl is here:

https://gist.github.com/meggart/3539a06f65f82d5149efcf90c37276b7

fabiangans · January 11, 2018, 2:07pm

Just in case someone stumbles upon this thread and to answer my own question. Today I thought about the problem again and I found a nice and fast way to do this by using the mutating broadcast! on a custom AbstractArray which does the reduction upon calling setindex! Here is my final implementation of broadcast_reduce which has no allocations and is faster than the allocating version:

#Define abstract Array which consumes values on setindex!
type ArrayReducer{T,N,F} <: AbstractArray{T,N}
    v::T
    size::NTuple{N,Int}
    op::F
end
ArrayReducer{T,N}(size::NTuple{N},v0::T,op)=ArrayReducer{T,N,typeof(op)}(v0,size,op)
Base.setindex!(x::ArrayReducer,v,i...) = x.v=x.op(x.v,v)
Base.size(a::ArrayReducer)=a.size
get_stop(a::Base.OneTo)=a.stop

#Define the function
function broadcast_reduce(f,op,v0,A,Bs...)
    shape = Base.Broadcast.broadcast_indices(A,Bs...)
    reducer = ArrayReducer(get_stop.(shape),v0,op)
    broadcast!(f,reducer,A,Bs...)
    reducer.v
end

# First check if both give the same result
x1 = rand(200,5000,30)
x2 = rand(1,5000,1)
x3 = rand(1,1,1)

println(sum(x1.*x2.*x3))
println(broadcast_reduce(*,+,0.0,x1,x2,x3))

#Then benchmark
using BenchmarkTools
@btime sum(x1.*x2.*x3)
#  109.582 ms (57 allocations: 228.88 MiB)

@btime broadcast_reduce(*,+,0.0,x1,x2,x3)
# 35.359 ms (46 allocations: 1.81 KiB)

josuagrw · March 1, 2018, 1:55pm

That seems really useful, maybe it could be added to a package? (Or become part of a new package?)

e3c6 · April 28, 2019, 11:21pm

ArrayReducer is very clever. To bump @josuagrw question, is this available in an “official” package?

fabiangans · April 29, 2019, 6:24am

No, I didn’t make an official package out of this. It might be a good addition to some other package though so feel free to grab the code snippet and use it wherever you want.

e3c6 · April 29, 2019, 8:49am

I think something like ArrayReducer could be useful in https://github.com/JuliaLang/julia/issues/19198 or https://github.com/JuliaLang/julia/pull/31020.

Topic		Replies	Views
Non-allocating reduced broadcast? General Usage question	4	1288	April 29, 2019
Mapreduce with broadcasting Performance	9	773	March 2, 2022
Avoid intermediate arrays in reduction of broadcast General Usage question	11	748	May 25, 2019
Is it possible to do a `mapreduce` with multiple arrays while broadcasting over so General Usage	4	592	March 18, 2021
`broadcast` with iterators? General Usage	3	593	June 30, 2017

Is there something like broadcast_mapreduce?

Related topics