Avoid temporary array creation for the argument of `reduce`

It is common to reduce (e.g. sum) the result of an array expression without needing the whole resulting array itself. In this case Julia will still create a temporary array to pass to the function, even if you managed to avoid temporary arrays before by taking advantage of loop fusion. Is there some way to skip that for reduction somehow? Perhaps some macro or specialized function?

If the reduced expression has only one array argument, one can use mapreduce(), but since it only accepts one iterator, for an expression with several arrays zip() is required which degrades the performance (it also looks somewhat ugly without tuple destructuring). As an example:

function test_vectorized(x, y)
    sqrt(sum((x .- y) .^ 2))

function test_devectorized(x, y)
    s = zero(eltype(x))
    @inbounds @simd for i in 1:length(x)
        s += (x[i] - y[i])^2

function test_mapreduce(x, y)
    sqrt(mapreduce(p -> (p[1]-p[2])^2, +, zip(x, y)))

x = rand(100000000)
y = rand(100000000)

r1 = test_vectorized(x, y)
@time test_vectorized(x, y)

r2 = test_devectorized(x, y)
@time test_devectorized(x, y)

r3 = test_mapreduce(x, y)
@time test_mapreduce(x, y)

@assert isapprox(r1, r2)
@assert isapprox(r1, r3)

The result:

  0.575121 seconds (87 allocations: 762.946 MiB, 18.08% gc time)
  0.094775 seconds (5 allocations: 176 bytes)
  0.140150 seconds (9 allocations: 272 bytes)

Is there a better way to do this kind of calculation without resorting to devectorization?

You can do sqrt(sum((a - b)^2 for (a, b) in zip(x, y))), but it’s quite verbose and I’m not sure how it will perform. A reduction syntax which would automatically be combined with dot broadcasting operations has been discussed, but no solution exists yet. See for example this comment.

1 Like

Another way to do this is: norm(x-y). It is the most character economical but allocates another array. Perhaps an iterator/stream/generator version of norm can be defined.

IMHO the mapreduce version is fast enough but the indexing is annoying aesthetically, so I defined:

import Base: mapreduce
mapreduce(f, op, v0, itr1, itr2) = mapreduce(t->f(t[1],t[2]),op,v0, zip(itr1,itr2))

and then the mapreduce flows better:


And runs quite decently fast.

1 Like

Better than test_vectorized(), but worse than test_mapreduce(), unfortunately.

Thanks for the link, glad to see I’m not the only one who needs this functionality.

It is actually as slow as test_vectorized(), because norm() is still a fusion boundary. Plus, the norm is just an example, I have various expressions I need to sum.

Yes, that’s what I ended up with as well. Still 50% slower than devectorized code, but at least not as slow as just sum().

Just stumbled on this old discussion - and I was wondering, since a lot has changed since then, esp. with broadcasting, is there work going on regarding elision of temporary arrays?

It just so … imperfect … that sum(log.(A)) allocates a temporary array (mapreduce(log, +, A) won’t, of course). Nowadays, we could use

sum(Base.broadcasted(log, A))

It works, but it’s kinda verbose, and doesn’t support dot-notation. If there was a way to say “don’t materialize my broadcast here” … is there something going on in that direction? I’d be surprised if people haven’t thought about it already (some probably in depth).

It might also be help if Base.broadcasted were a (read-only) AbstractArray, e.g. to allow multi-threaded operation. Or alternatively, if there was a standard option to materialize it as a some kind of mapped-array type. But maybe this has been considered too already (and possibly been rejected for good reasons)?

Only in cases were all bcast arguments are arrays, of course (via bcast style?).

Using @~ from LazyArrays.jl, you can write sum(@~ log.(A)). In Julia itself, the discussion is happening at:



Thanks for the links, @tkf! I was pretty sure that there would be a discussion about this going on somewhere, just didn’t manage to find it for some reason.

1 Like