How do I retrieve values using an array generator?

I know how to create an array using

array = [x + y for x in 2:8, y in -1:3]

But I am trying to figure out how to retrieve elements of an array that the array generator creates e.g.

array_gen = (x + y for x in 2:8, y in -1:3)

How do I extract the values I need e.g. if I just want the second row in the array? I know indexing won’t work in this case, do I just use collect() and then index it?

Yes I don’t think there is any other good way. Especially if there is dimensional structure. The generator in principle just produces one value after the next. I don’t think there is any way to just partially manifest it (except sequentially from the start using first but that loses the dimensional information) because in principle the generator could have state.

1 Like

I see. So what would be a good use case for array generators, other than what people mention about it saving memory. For example

array_gen_plus = (x + y for x in 2:8, y in -1:3)
array_gen_mult = (x * y for x in 2:8, y in -1:3)

if True
    collect(array_gen_plus)
else
    collect(array_gen_mult)
end

Something like this?

Firstly, it’s called a “generator expression”, and actually it’s often better to consider it as just a bit of syntactic sugar over Iterators.map. Furthermore, there are some user packages that present similar or the same interfaces as Iterators.map, but are improved in some respects, e.g.: FlexiMaps, LazyMapWithElType, so you might want to use that instead.

As for what’s a good use case: a generator expression returns an iterator, which are good for iterating over. E.g.: for el ∈ iterator; ...; end. Or you can use mapreduce or something instead of for, of course.

Not sure if you care, but this code would be nicer after some deduplication. Open a new topic if you’re interested in exploring this.

Generators are nice when you want to avoid collecting, instead just iterating without allocating any temporary array. If you just want to collect, then use an array comprehension.

1 Like

As DNF said, generators are good when you don’t want to collect.

Having said that, if you want to pluck a certain element from the generated Array then you might want to avoid generation of the rest of the elements. This is attempted in the following function:

generatorgetindex(g, I...) = 
  foldl((x,y)->y,Iterators.take(array_gen,LinearIndices(size(g))[I...]))

For example, with array_gen from the OP:

julia> collect(array_gen)
7×5 Matrix{Int64}:
 1  2  3   4   5
 2  3  4   5   6
 3  4  5   6   7
 4  5  6   7   8
 5  6  7   8   9
 6  7  8   9  10
 7  8  9  10  11

julia> generatorgetindex(array_gen, 2,4)
5
4 Likes

Thank you! This is exactly what I was thinking about, but it seems like there isn’t a function to do this natively?

Maybe I just haven’t come across a good use case for what I’ve been doing so far, but I can’t figure out when I would need to iterate without collecting. Unless say I have several possible iterated arrays of huge dimensions that I don’t want to allocate at the start?

A classic application is if you want to find the sum or maximum over something:

sum(tan(x)^2 for x in X) 

is more efficient than

sum([tan(x)^2 for x in X]) 

or

sum(tan.(X).^2) 
5 Likes

My favorite is usually mapreduce((x -> x^2) ∘ tan, +, X, init = zero(eltype(X))).

mapreduce should be better than composing iterators several levels deep, at least with the current iteration protocol.

One thing I really like about the generator sum is that I don’t have to mess about to find the correct initial value type.

sum isn’t different than mapreduce in this regard, it seems:

julia> sum(Int[])
0

julia> reduce(+, Int[])
0

julia> mapreduce(identity, +, Int[])
0

EDIT: no, you’re right. The above result is because identity is special-cased, I guess.

Still, even sum will fail when it can’t infer a good type, e.g., sum([]) fails. Or it could return an inappropriate type, if the element type of the collection its given is abstract.