I want to perform a mapreduce over multiple iterators that don’t have the same dimensions, but which can be broadcast together. For example:
arr1 = rand(1, 1, 3)
arr2 = rand(100, 100, 1)
mapreduce(+, *, arr1, arr2; dims=(1, 2))
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(1), Base.OneTo(1), Base.OneTo(3)), b has dims (Base.OneTo(100), Base.OneTo(100), Base.OneTo(1)), mismatch at 1")
Is this possible? Or if not, are there any Julia utilities that can act ‘expand’ the dimensions of an array along 1 or more axes, without expanding the memory.
Thanks for that suggestion. It seems to work for cpu code, but the ultimate destination for this mapreduce is on the GPU and CUDA interactions with LazyArrays seems to trigger scalar indexing.
For example:
using CUDA
using LazyArrays
arr1 = CUDA.rand(100, 100, 1)
arr2 = CUDA.rand(1, 1, 3)
arr3 = LazyArray(@~ arr1 .+ arr2)
CUDA.reduce(+, arr3, dims=(1, 2))
Warning: Performing scalar indexing on task Task (runnable)
(The reason I’m trying to use mapreduce or reduce here is because these functions are consistently faster than the manual reductions I am able to write myself.)
@maleadt I saw this commit suggesting mapreduce in GPUArrays already supports broadcasting? Is this correct, and if so, is there a flag or similar required to enable it?
@jling Yeah it’s a poor minimal example, but it does work - the error you’re seeing is due to > having no defined identity element. If you provide an init=0, for example, it will work.
The actual use case here is a type of 2d fourier transform. It’s a bit too heavy to get into the details, but the map operator is a little exponential, and the reduction sums across all the data points onto a 2D grid.
I have one working work-around, but I don’t know if its optimal. It does, however, have the nice property that it works on the GPU too (in pseudocode):
function broadcast_reduce(f, op, A, B, C, ...; dims, init)
bc = Broadcast.instantiate(Broadcast.broadcasted(f, A, B, C...)
return reduce(op, bc; dims, init)
end
Kind of, but there’s no user-facing interface. This should be done with LazyArrays, we’re probably just missing some dispatches for those lazy values to be compatible with the GPU broadcast implementation.