TL;DR GitHub - Suzhou-Tongyuan/UnzipLoops.jl: broadcast and unzip it! provides one single function broadcast_unzip
– it works similar to broadcasting but unzip
the output in the same loop for better performance.
Still need to wait 3 days to register it: New package: UnzipLoops v0.1.0 by JuliaRegistrator · Pull Request #70918 · JuliaRegistries/General · GitHub
Where the story begins
When broadcasting a function f
that has multiple outputs, a need for unzip
arises.
f(x, y) = x^y, x/y
X, Y = [1, 2, 3, 4], [4, 3, 2, 1]
out = f.(X, Y)
This out
is of type Vector{Tuple{Int,Int}}
. It’s often the case that we’ll need to unzip it to Tuple{Vector{Int}, Vector{Int}}
. For most the case, we can trivially split it apart:
function g(X, Y)
out = f.(X, Y)
return getindex.(out, 1), getindex.(out, 2)
end
You may want to find some smart function to automatically unzip it so that g
becomes g(X, Y) = unzip(f.(X, Y)
.
But here’s the conclusion – unzip
alone, whether it is lazily- or eagerly- evaluated, isn’t the optimal solution. Making unzip
lazily can save one allocation, but the additional loops are still there.
X, Y = rand(1:5, 1024), rand(1:5, 1024)
@btime f.($X, $Y) # 3.834 μs (1 allocation: 16.12 KiB)
@btime g($X, $Y) # 5.388 μs (4 allocations: 32.41 KiB)
By pre-allocating the output and manually writing the loops, we get a more performant version:
function g(X, Y)
@assert size(X) == size(Y)
Base.require_one_based_indexing(X, Y)
T = promote_type(Float64, eltype(X), eltype(Y))
N = ndims(X)
Z1 = Array{T,N}(undef, size(X))
Z2 = Array{T,N}(undef, size(X))
@inbounds @simd for i in eachindex(X)
v = f(X[i], Y[i])
Z1[i] = v[1]
Z2[i] = v[2]
end
return Z1, Z2
end
@btime g($X, $Y) # 3.999 μs (2 allocations: 16.25 KiB)
broadcast_unzip
saves the labor work
Obviously, rewriting the trivial getindex
solution into the verbose manual loop introduces much labor work and hurts the readability. This is why broadcast_unzip
is introduced – it’s a combination of broadcasting and unzip. Most importantly, this is simple to use and yet fast:
g(X, Y) == broadcast_unzip(f, X, Y) # true
@btime broadcast_unzip(f, $X, $Y) # 4.009 μs (2 allocations: 16.25 KiB)
Additionally, broadcast_unzip
accepts more inputs (just like map
) as long as their sizes match and f
outputs a Tuple
of a scalar-like object.
X, Y, Z = rand(1:5, 1024), rand(1:5, 1024), rand(1:5, 1024)
f(x, y, z) = x ^ y ^ z, x / y / z, x * y * z, x / (y*z)
out = broadcast_unzip(f, X, Y, Z)
@assert out[1] == getindex.(f.(X, Y, Z), 1)
@btime map(f, $X, $Y, $Z) # 13.682 μs (2 allocations: 32.05 KiB)
@btime broadcast_unzip(f, $X, $Y, $Z) # 13.418 μs (6 allocations: 32.58 KiB)
This is a joint work (by-product) of me and @thautwarm when developing packages in Tongyuan and we believe the community needs this tool, too. – As far as I know, although this is a common need, such a tool didn’t exist before.