Let’s say I have a big array:
big = rand(1:4, 8_000_000)
I have a loop that will run 180 times and filter big and do something with the results. Each time I filter big, the result vector will be a different size (trivially in the example, but with much more variation in the real code) so it’s hard to pre-allocate and just update in place.
Instead, I want the filter to be an iterator so I can loop through the results one at a time–the filtered array is never created/allocated: I just get one scalar per iteration. At least, this is what I am assuming. But, I can’t get it work out that way. Or, maybe I am expecting the wrong thing.
julia> big = rand(1:4, 8_000_000);
julia> ifiltbig = Iterators.filter(x->x==4, big)
Base.Iterators.Filter{var"#119#120",Array{Int64,1}}(var"#119#120"(), [2, 4, 3, 4, 2, 2, 3, 4, 2, 4 … 3, 4, 4, 3, 1, 4, 4, 4, 1, 4])
julia> @btime for i in $ifiltbig
i * 4 # we really do more with i...
end;
18.252 ms (0 allocations: 0 bytes)
julia> @btime for i in filter(x->x==4, $big)
i * 4 # really, do more...
end;
12.433 ms (3 allocations: 61.04 MiB)
So, my largest source array is 8.4 million rows so this is close. The iterator version is slower but has no allocations. So, certainly less RAM consumed though RAM isn’t a problem with an 8 million by 13 array of Int. The non-iterator version has to create a temporary array to hold the result of the filter which allocates a lot of RAM (though the amount doesn’t make sense…) and runs faster iterating over the storage of the filtered result. So, no allocations is not helping execution speed.
I would have expected the allocated memory to be 8 bytes * 8 million items * approximately .25 for the filtered size = approx. 1.6e7 bytes.
As Jefferson would say, “What did I miss?” (Hamilton musical…)