[ANN] View common generators as arrays

The new package GeneratorArrays effectively converts a generator into a lazy form of a read-only abstract array.

The object array(f(i) for i in a) can be used like [f(i) for i in a] without using an extra array for storage. If a is an AbstractArray it is equivalent to MappedArrays.mappedarray(f, a).

Example:

julia> g = ( i - j for i in 1:5, j in 2:4)
Base.Generator{Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}},var"#56#57"}(var"#56#57"(), Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}}((1:5, 2:4)))

julia> array(g)
5Ă—3 GeneratorArrays.GeneratorArray{Int64,2,Base.Generator{Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}},var"#56#57"}}:
 -1  -2  -3
  0  -1  -2
  1   0  -1
  2   1   0
  3   2   1

julia> array(g)[5,2]
2
19 Likes

Hm, wouldn’t this fit perfectly into LazyArrays.jl, as another subtype of LazyArray?

100 thumbs up

julia> using StatsBase; using GeneratorArrays

julia> countmap(array(rand(1:3) for _ in 1:10^5))
Dict{Int64,Int64} with 3 entries:
  2 => 33428
  3 => 33235
  1 => 33337

This really fits well with the philosophy of doing the conversion at the call site.

How does element access happen? Am I correct that this might only be faster if you are only accessing each element a handful of times?

Generators are really just an anonymous function + an iteration space. They (currently) only support iteration, and folks generally assume that they’ll iterate sequentially, but this package reaches into their internals to compute the axes of the iteration space and appropriately call the anonymous function on demand upon indexing.

We’re slowly working our way along this path in base itself, and I’d love to see us keep sliding down this slope.

12 Likes

Couldn’t this be implemented as a thin wrapper around MappedArrays, i.e. just by a constructor for a MappedArray from a Generator?

6 Likes

Am I correct that this might only be faster if you are only accessing each element a handful of times?

Like many “lazy” approaches, this implementation trades space for speed. But the main design goal was not to save space but to make the generator syntax available for all the methods expecting abstract arrays, for a wide range of generators.
You will see speed advantages, if the saved space is huge and/or you want only to access some of the indices. There is room for performance improvements specially for the product iterator case.

1 Like

wouldn’t this fit perfectly into LazyArrays.jl?

Couldn’t this be implemented as a thin wrapper around MappedArrays?

There are now MappedArrays, LazyArray, and GeneratorArrays, which seem to overlap. But I think that is only the case in the most elementary situations and each of the packages adds another dimension of features.
In the base case, these are more or less equivalent:

map(f, A)      => LazyArrays.applied(f, A)
map(f, A)      => MappedArrays.mappedarray(f, A)
[x for x in A] => GeneratorArrays.array(x for x in A)
f.(A)          => non-materializing form for broadcasting - new syntax?

In contrast to the other approaches, GeneratorArray adds two stand-alone features.

  1. It makes generators and iterators usable as arguments of methods, which accept AbstractArray; at least for some widely used forms of generators.
  2. It re-establishes or creates (array-) structure which has been lost by the generator notation. The ProductIterator forms higher dimensions as space-product of lower dimensions.

For these reasons I think, the package provides enough orthogonality to justify its independence and should be maybe moved to JuliaArrays besides the other two.

If that should not be assumed, the manual should mention that (sorry if I missed it). TBH, I was under the impression that generators do iterate sequentially, as they are closely related to collect.

3 Likes