Filtering elements of a generator without collecting

Hi everyone, I am playing around with iterators and I have the following problem. Lets us say we want to compute

using Random

weights = shuffle(vcat(1:5, zeros(5)));
mats = (randn(10, 10) for _ in 1:10) # Or any generator comprehension! 
sum(Iterators.map(*, weights, mats))

Then, since weights has many zeros, I ideally would like to skip the corresponding elements in mats (i.e., never generate them at all).

Note: Here, mat could in principle be any generator comprehension (not necessarily random). Think of it as handed out like a black box. On top of it, it would also be too expensive to collect and index, so I am looking for something that would work even if the only promise on mat is that it is a generator comprehension.

Is there some way to do that without collect(mats)?

How about this:

using Random

function maybeskip(generate, skip, flag)
    if flag == 0.0
        println("Skip")
        skip()
    else
        println("Generate")
        generate()
    end
end

generate() = randn(10, 10)
skip() = zeros(10,10)

function test()
    weights = shuffle(vcat(1:5, zeros(5)))
    sum(Iterators.map(*, weights, maybeskip.(generate, skip, weights)))
end

If you don’t need printing, maybeskip is a lot like ifelse.


skip=findall(==(0), weights)

mats=(rand(10,10) for i in setdiff(1:10,skip))
1 Like

sum(generate_mat() * w for w in weights if w != 0)

Or just write a loop. Loops are fast in Julia, you don’t need to contort yourself to avoid them.

1 Like

@stevengj @rocco_sprmnt21 @contradict First, thank you all for the suggestions, and second I am sorry that my question was not clear enough. In particular, I am trying to wire up something that works for any generator comprehension given as a black box. The random matrices example was just for the sake of concreteness. I have edited to post to make this clearer.

The iterator protocol does not provide a way to skip an element without fetching its value, so if you need the elements of mats that correspond to the elements of weights then there is no way to avoid computing all of the elements of mats without digging deeper into Generator objects.

In particular, a generator is actually just a combination of an iterator and a function that acts on that iterator, so you could do e.g.:

sum(mats.f(i) * w for (w, i) in zip(weights, mats.iter) if w != 0)

(But in this case, why form a mats generator object in the first place? If you need to look inside it, then a generator is arguably the wrong abstraction. e.g. why not just pass the function f(i) that generates the i-th matrix?)