TL;DR GitHub - Suzhou-Tongyuan/UnzipLoops.jl: broadcast and unzip it! provides one single function `broadcast_unzip`

– it works similar to broadcasting but `unzip`

the output in the same loop for better performance.

Still need to wait 3 days to register it: New package: UnzipLoops v0.1.0 by JuliaRegistrator · Pull Request #70918 · JuliaRegistries/General · GitHub

# Where the story begins

When broadcasting a function `f`

that has multiple outputs, a need for `unzip`

arises.

```
f(x, y) = x^y, x/y
X, Y = [1, 2, 3, 4], [4, 3, 2, 1]
out = f.(X, Y)
```

This `out`

is of type `Vector{Tuple{Int,Int}}`

. It’s often the case that we’ll need to unzip it to `Tuple{Vector{Int}, Vector{Int}}`

. For most the case, we can trivially split it apart:

```
function g(X, Y)
out = f.(X, Y)
return getindex.(out, 1), getindex.(out, 2)
end
```

You may want to find some smart function to automatically unzip it so that `g`

becomes `g(X, Y) = unzip(f.(X, Y)`

.

But here’s the conclusion – ** unzip alone, whether it is lazily- or eagerly- evaluated, isn’t the optimal solution**. Making

`unzip`

lazily can save one allocation, but the additional loops are still there.```
X, Y = rand(1:5, 1024), rand(1:5, 1024)
@btime f.($X, $Y) # 3.834 μs (1 allocation: 16.12 KiB)
@btime g($X, $Y) # 5.388 μs (4 allocations: 32.41 KiB)
```

By pre-allocating the output and manually writing the loops, we get a more performant version:

```
function g(X, Y)
@assert size(X) == size(Y)
Base.require_one_based_indexing(X, Y)
T = promote_type(Float64, eltype(X), eltype(Y))
N = ndims(X)
Z1 = Array{T,N}(undef, size(X))
Z2 = Array{T,N}(undef, size(X))
@inbounds @simd for i in eachindex(X)
v = f(X[i], Y[i])
Z1[i] = v[1]
Z2[i] = v[2]
end
return Z1, Z2
end
@btime g($X, $Y) # 3.999 μs (2 allocations: 16.25 KiB)
```

#
`broadcast_unzip`

saves the labor work

Obviously, rewriting the trivial `getindex`

solution into the verbose manual loop introduces much labor work and hurts the readability. This is why `broadcast_unzip`

is introduced – it’s a combination of broadcasting and unzip. Most importantly, this is simple to use and yet fast:

```
g(X, Y) == broadcast_unzip(f, X, Y) # true
@btime broadcast_unzip(f, $X, $Y) # 4.009 μs (2 allocations: 16.25 KiB)
```

Additionally, `broadcast_unzip`

accepts more inputs (just like `map`

) as long as their sizes match and `f`

outputs a `Tuple`

of a scalar-like object.

```
X, Y, Z = rand(1:5, 1024), rand(1:5, 1024), rand(1:5, 1024)
f(x, y, z) = x ^ y ^ z, x / y / z, x * y * z, x / (y*z)
out = broadcast_unzip(f, X, Y, Z)
@assert out[1] == getindex.(f.(X, Y, Z), 1)
@btime map(f, $X, $Y, $Z) # 13.682 μs (2 allocations: 32.05 KiB)
@btime broadcast_unzip(f, $X, $Y, $Z) # 13.418 μs (6 allocations: 32.58 KiB)
```

This is a joint work (by-product) of me and @thautwarm when developing packages in Tongyuan and we believe the community needs this tool, too. – As far as I know, although this is a common need, such a tool didn’t exist before.