Avoid temporary array creation for the argument of `reduce`

fjarri · June 26, 2017, 7:42am

It is common to reduce (e.g. sum) the result of an array expression without needing the whole resulting array itself. In this case Julia will still create a temporary array to pass to the function, even if you managed to avoid temporary arrays before by taking advantage of loop fusion. Is there some way to skip that for reduction somehow? Perhaps some macro or specialized function?

If the reduced expression has only one array argument, one can use mapreduce(), but since it only accepts one iterator, for an expression with several arrays zip() is required which degrades the performance (it also looks somewhat ugly without tuple destructuring). As an example:

function test_vectorized(x, y)
    sqrt(sum((x .- y) .^ 2))
end

function test_devectorized(x, y)
    s = zero(eltype(x))
    @inbounds @simd for i in 1:length(x)
        s += (x[i] - y[i])^2
    end
    sqrt(s)
end

function test_mapreduce(x, y)
    sqrt(mapreduce(p -> (p[1]-p[2])^2, +, zip(x, y)))
end


x = rand(100000000)
y = rand(100000000)

r1 = test_vectorized(x, y)
@time test_vectorized(x, y)

r2 = test_devectorized(x, y)
@time test_devectorized(x, y)

r3 = test_mapreduce(x, y)
@time test_mapreduce(x, y)

@assert isapprox(r1, r2)
@assert isapprox(r1, r3)

The result:

  0.575121 seconds (87 allocations: 762.946 MiB, 18.08% gc time)
  0.094775 seconds (5 allocations: 176 bytes)
  0.140150 seconds (9 allocations: 272 bytes)

Is there a better way to do this kind of calculation without resorting to devectorization?

nalimilan · June 26, 2017, 1:48pm

You can do sqrt(sum((a - b)^2 for (a, b) in zip(x, y))), but it’s quite verbose and I’m not sure how it will perform. A reduction syntax which would automatically be combined with dot broadcasting operations has been discussed, but no solution exists yet. See for example this comment.

Dan · June 26, 2017, 2:13pm

Another way to do this is: norm(x-y). It is the most character economical but allocates another array. Perhaps an iterator/stream/generator version of norm can be defined.

IMHO the mapreduce version is fast enough but the indexing is annoying aesthetically, so I defined:

import Base: mapreduce
mapreduce(f, op, v0, itr1, itr2) = mapreduce(t->f(t[1],t[2]),op,v0, zip(itr1,itr2))

and then the mapreduce flows better:

sqrt(mapreduce((x,y)->(x-y)^2,+,0.0,x,y))

And runs quite decently fast.

fjarri · June 27, 2017, 2:44am

Better than test_vectorized(), but worse than test_mapreduce(), unfortunately.

Thanks for the link, glad to see I’m not the only one who needs this functionality.

fjarri · June 27, 2017, 2:48am

It is actually as slow as test_vectorized(), because norm() is still a fusion boundary. Plus, the norm is just an example, I have various expressions I need to sum.

Yes, that’s what I ended up with as well. Still 50% slower than devectorized code, but at least not as slow as just sum().

oschulz · August 30, 2019, 10:46pm

Just stumbled on this old discussion - and I was wondering, since a lot has changed since then, esp. with broadcasting, is there work going on regarding elision of temporary arrays?

It just so … imperfect … that sum(log.(A)) allocates a temporary array (mapreduce(log, +, A) won’t, of course). Nowadays, we could use

sum(Base.broadcasted(log, A))

It works, but it’s kinda verbose, and doesn’t support dot-notation. If there was a way to say “don’t materialize my broadcast here” … is there something going on in that direction? I’d be surprised if people haven’t thought about it already (some probably in depth).

It might also be help if Base.broadcasted were a (read-only) AbstractArray, e.g. to allow multi-threaded operation. Or alternatively, if there was a standard option to materialize it as a some kind of mapped-array type. But maybe this has been considered too already (and possibly been rejected for good reasons)?

oschulz · August 30, 2019, 11:03pm

Only in cases were all bcast arguments are arrays, of course (via bcast style?).

tkf · August 30, 2019, 11:30pm

Using @~ from LazyArrays.jl, you can write sum(@~ log.(A)). In Julia itself, the discussion is happening at:

github.com/JuliaLang/julia

Notation for lazy map

opened 06:46PM - 02 Nov 16 UTC

cossio

broadcast

The notation `f.(v)` effectively maps `f` over an array `v`. This operation is n…ot lazy, it materializes the mapped array. I think it could be useful to have an analogous notation for lazy maps. Maybe `f..(v)`? Instead of materializing the mapped array, this should return something equivalent to the generator: ```(f(x) for x in v)``` For example, this could be useful to do something like: ```sum(f..(v))``` which is more efficient than materializing the intermediate array in: ```sum(f.(v))``` Of course right now `sum` takes an optional function argument, so one can write `sum(f,v)`. But see the discussion here: https://github.com/JuliaLang/julia/issues/19146. If one decides to remove the method `sum(f,v)`, I think the notation `sum(f..(v))` could be a nice alternative.

See:

github.com/JuliaLang/julia

Arraylike: AbstractArrays without an eltype

opened 03:24PM - 17 Aug 19 UTC

closed 06:47PM - 21 Jan 20 UTC

bramtayl

Would be useful for many packages, JuliennedArrays, LightQuery, SplitApplyCombin…e, MappedArrays, Query, etc. Base already has an AbstractEltypelessArray: the lazy broadcast machinery, it would just be a matter of formalizing the interface. Given that traits aren't coming any time soon, it couldn't be `<: AbstractArray`, so we would have to duplicate all of the machinery for `AbstactArrays`. I could try to build a prototype if people are interested?

oschulz · August 31, 2019, 10:03am

Thanks for the links, @tkf! I was pretty sure that there would be a discussion about this going on somewhere, just didn’t manage to find it for some reason.

Topic		Replies	Views
Purpose of reduce(Array, Array) General Usage question , function	5	3456	May 12, 2020
Avoid intermediate arrays in reduction of broadcast General Usage question	11	729	May 25, 2019
Very best way to concatenate array of arrays, while applying a function New to Julia question	10	940	October 4, 2022
Should Julia be able to optimize away small temporary arrays? New to Julia question , performance , array , tuple	10	1228	July 23, 2022
How to "reduce" an array? General Usage cuda , arrays	19	1210	January 14, 2023

Avoid temporary array creation for the argument of `reduce`

Related topics