Where possible and sensible, allow `zip` to pass `getindex` through to its underlying iterables

zip is a useful function, but it can be frustrating because it does not behave like a “real” array even when its underlying iterables are. For instance, I would expect that if all underlying iterables passed to zip support getindex, then so would the Zip, by calling it in turn on each underlying iterable and then collecting the results into a tuple. But this is not the case:

julia> a = 1:10
1:10

julia> a[4]
4

julia> zip(a, a)[4]  # I'd expect this to be (a[4], a[4]) == (4, 4)
ERROR: MethodError: no method matching getindex(::Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}, ::Int64)

This makes composing functionality with Zips much harder than it could be. Similar difficulties include the fact that eachindex, keys, findall, etc. do not work on Zips. Perhaps a new function/type, indexed_zip/IndexedZip, that expects the underlying iterables to support getindex and forwards related functions to the underlying iterables before collecting them into a tuple? The implementation would look like this:

Base.getindex(z::Iterators.IndexedZip, i) = (it -> getindex(it, i)).(z.is)
4 Likes

Bumping old post, but this comes up repeatedly here and in other forums. StructArrays is not too heavy and works well in many situations:

julia> a = 1:100;

julia> @btime zip($a, $a);
  2.419 ns (0 allocations: 0 bytes)

julia> za = zip(a, a);

julia> @time using StructArrays
  0.074154 seconds (141.46 k allocations: 9.973 MiB, 14.93% compilation time)

julia> @btime StructArray(($a, $a));
  5.669 ns (0 allocations: 0 bytes)

julia> sa = StructArray((a, a));

julia> @btime first($za)
  2.852 ns (0 allocations: 0 bytes)
(1, 1)

julia> @btime first($sa)
  2.197 ns (0 allocations: 0 bytes)
(1, 1)

julia> @btime $sa[10]
  2.418 ns (0 allocations: 0 bytes)
(10, 10)

A zippier way to construct StructArray rather than StructArray and StructVector would make it more appealing…

julia> const zippy = StructArray;

julia> (ra, rb) = (rand(10), rand(10));

julia> findall(((x, y),) -> x > 0.5 && y < 0.5, zippy((ra, rb)))
2-element Vector{Int64}:
  9
 10

The issue with StructArrays is often not in using it, but in discovering it

1 Like