Why do some functions not output the result directly?

Hello, so for context I’m a R user.

I’ve been trying to use this function

using IterTools
IterTools.product(collect(1:5), collect(1:5))

But I get this output.

Base.Iterators.ProductIterator{Tuple{Vector{Int64}, Vector{Int64}}}(([1, 2, 3, 4, 5], [1, 2, 3, 4, 5]))

This is somewhat confusing to me because in R, usually when you enter a function you get the output right away.

expand.grid(c(1:5), c(1:5))
   Var1 Var2
1     1    1
2     2    1
3     3    1
4     4    1
5     5    1
6     1    2
7     2    2
8     3    2
9     4    2
10    5    2
11    1    3
12    2    3
13    3    3
14    4    3
15    5    3
16    1    4
17    2    4
18    3    4
19    4    4
20    5    4
21    1    5
22    2    5
23    3    5
24    4    5
25    5    5

Now I see from another thread that you can get the output by passing it to collect (as shown here), but why is this necessary? And is collect the function to always use when you want to see what the function outputs?

Lazy computation can be really nice. For a simple example, consider

for (i,j) in IterTools.product(collect(1:5), collect(1:5))
    f(i,j)
end

Since this is lazy, we don’t have to allocate a ton of memory for no reason. Giving the result eagerly would just be slower.

6 Likes

I’d avoid collect altogether, i.e. collect(1:5)1:5.

4 Likes

Is there somewhere I can read more about this? And is there a way to turn it off? I can’t quite seem to find a page for it.

Why do you want to turn it off?

It’s not a feature of the language, just of how the authors of Iterators implemented product. See the source code:

# from iterators.jl in Base
struct ProductIterator{T<:Tuple}
    iterators::T
end

product(iters...) = ProductIterator(iters)

The product function returns a ProductIterator, which as its name suggests, iterates over the product. All it is is a wrapper over a set of iterators.

If you want, you can define your own:
myproduct(iters...) = collect(Iterators.product(iters...))

What is a feature of the language is the fact that loops are fast and memory allocation is “slow” (slower than not allocating anyway), so unlike in R or python, oftentimes the less memory you can get away with allocating, the faster your code will end up being. Hence many package authors opt for lazy evaluation like in this case.

For reference, lazy vs eager evaluation:

julia> @btime [i+j for (i, j) in Iterators.product(1:100, 1:100)];
  6.669 μs (2 allocations: 78.17 KiB)

julia> @btime [i+j for (i, j) in collect(Iterators.product(collect(1:100), collect(1:100)))];
  21.276 μs (6 allocations: 236.22 KiB)
3 Likes

Well, as a newbie it helps to see what each function is actually outputting to make things easier to debug and understand.

For example, I was hoping to see what IterTools.product(1:5,1:5) outputted, so that I could figure out how to use its output for my next goal.

Now that I’m interacting with it more, there are some things I don’t understand

For example, this fails

IterTools.product(1:5,1:5)[1,1]

But this works

IterTools.product(1:5,1:5) |> collect
ans[1,1]

So it seems like collect is doing something important

Actually, that’s exactly what it’s showing you! See my reply from a moment ago above.

Correct. collect allocates an array of the appropriate size and then iterates over the ProductIterator, filling in each index in the array.

ProductIterator doesn’t define getindex (or any other indexing-related functions), so you can’t index into it like you would an array. It only implements iterate. In principle, you could define a getindex for it (edit: only if the individual iterators have getindex), but you may have to do it yourself…

5 Likes

I think this helps, though I think I have to go through the manual in more detail.

That being said, I think I’m getting a glimpse about why everyone’s so excited about the language now. It feels like some space age tool compared to the manual things I’d have to do in R.

The concept that this thread is describing is called Lazy Loading vs Eager Loading. The idea is simple. When you execute expand.grid(c(1:5), c(1:5)) in R, the memory is allocated for 25 elements regardless of whether you use the data or not. This is eager-loading, and is time-consuming and obviously consumes memory.

However, in certain situations, you don’t need to have all the elements at once. You could “lazily” load each element when you need it, and this is exactly what IterTools.product does. It knows internally that you are looking for a 25-element grid but it won’t actually do anything or allocate any memory unless you actually consume that data.

So in this case, one way to “consume” all the data is by calling collect on it. This will allocate the memory for all the elements in your list, and return/print the array to you.

3 Likes

Would it perhaps be possible to change the show method for Base.Iterators.ProductIterator to have it display like the OP’s request when it’s evaluated in the repl, while still keeping it lazy when used in computations?

I’m not sure how much IterTools is being used these days. The product function, at least, is available in the Iterators standard library, which you don’t need to import. So you can write

Iterators.product(1:5, 1:5)

instead of

using IterTools
IterTools.product(collect(1:5), collect(1:5))

It’s pretty cool that one can write

Iterators.product(1:10^9, 1:10^9)

and it doesn’t take up any more space than Iterators.product(1:2, 1:2). As long as you don’t use collect anywhere(!)

2 Likes

Isn’t Iterators deprecated?

The Iterators.jl package is deprecated, but I’m talking about the module Iterators, which is part of Julia Base, i.e. Base.Iterators. (I was a bit inaccurate in saying it’s a stdlib, it’s actually in Base.)

So you don’t need to load it, it’s always available, and you can just write

Iterators.product(...)

directly, without using.

2 Likes

You’re right, that does work.

I think the simple answer for the OP is to apply the collect function to the lazy-iterator object to display the whole collection.

julia> Iterators.product(1:5, 1:5)
Base.Iterators.ProductIterator{Tuple{UnitRange{Int64}, UnitRange{Int64}}}((1:5, 1:5))

julia> collect(Iterators.product(1:5, 1:5))
5×5 Matrix{Tuple{Int64, Int64}}:
 (1, 1)  (1, 2)  (1, 3)  (1, 4)  (1, 5)
 (2, 1)  (2, 2)  (2, 3)  (2, 4)  (2, 5)
 (3, 1)  (3, 2)  (3, 3)  (3, 4)  (3, 5)
 (4, 1)  (4, 2)  (4, 3)  (4, 4)  (4, 5)
 (5, 1)  (5, 2)  (5, 3)  (5, 4)  (5, 5)

This works generally, e.g.

julia> 1:5
1:5

julia> typeof(1:5)
UnitRange{Int64}

julia> collect(1:5)
5-element Vector{Int64}:
 1
 2
 3
 4
 5
4 Likes

One could print out the result without collecting:

for i in Iterators.product(1:5, 1:5)
    println(i)
end

I should maybe point out that the reason that an Iterators.product(1:5, 1:5) instance doesn’t display itself by collecting is two-fold:

  1. Some iterators are destructive and once you start iterating them, you cannot “replay” that iteration, so in general doing iteration in the show method for a product iterator is not good.
  2. Even if all the iterators that the product includes can be replayed, you’d want to be clever about only showing the beginning and ending of a very large iterator, which you can’t do for many iterators since they don’t generally support random access or reversal.

If you explicitly collect an iterator, neither of these is an issue anymore. As has been noted, if you’d prefer to have an eager product, you can define

product(iters...) = collect(Iterators.product(iters...))

This gives you the eager behavior:

julia> product(1:5, 1:5)
5×5 Matrix{Tuple{Int64, Int64}}:
 (1, 1)  (1, 2)  (1, 3)  (1, 4)  (1, 5)
 (2, 1)  (2, 2)  (2, 3)  (2, 4)  (2, 5)
 (3, 1)  (3, 2)  (3, 3)  (3, 4)  (3, 5)
 (4, 1)  (4, 2)  (4, 3)  (4, 4)  (4, 5)
 (5, 1)  (5, 2)  (5, 3)  (5, 4)  (5, 5)

If you want it, the lazy version is still available as Iterators.product.

2 Likes

Well now that I understand more about how this all works, I think the lazy evaluation makes sense and that it also makes sense to not ‘see it itself’ for lack of a better term. It’s just that to a newbie from R, some of this is rather mystifying and it’s not always clear what to Google to clear things up.

So, the confusion isn’t so much around Iterators.product() itself, it’s just a more general confusion. For example,

using Distributions
Binomial(1)

gives

Binomial{Float64}(n=1, p=0.5)

But clearly, collect is not the way to go here if I want to see the “actual output”, which is a misnomer. Going into the documentation for Binomial doesn’t quite tell the answer, but you can find it in Distributions

rand(Binomial(1), 1)

I’d say in these cases you are seeing the actual output, but what you want to know is the behavior?
I.e., that an Iterators.Product defines iteration, and what the Base.iterate method is actually returning.

That the Binomial object defines rand, logpdf, cdf, quantile etc.
There is a lot someone may want to do with a binomial other than sample from it.
In R, how did you find out about rbinom, dbinom(...,log=true), pbinom, qbinom?

Probably a bit overwhelming, in that I see 152 methods:

julia> methodswith(Binomial{Float64},supertypes=true)

While the help seems a bit brief, and doesn’t actually mention rand:

help?> Binomial
search: Binomial binomial PoissonBinomial NegativeBinomial BetaBinomial

  Binomial(n,p)


  A Binomial distribution characterizes the number of successes in a sequence of independent trials. It has two parameters:
  n, the number of trials, and p, the probability of success in an individual trial, with the distribution:

  P(X = k) = {n \choose k}p^k(1-p)^{n-k},  \quad \text{ for } k = 0,1,2, \ldots, n.

  Binomial()      # Binomial distribution with n = 1 and p = 0.5
  Binomial(n)     # Binomial distribution for n trials with success rate p = 0.5
  Binomial(n, p)  # Binomial distribution for n trials with success rate p

  params(d)       # Get the parameters, i.e. (n, p)
  ntrials(d)      # Get the number of trials, i.e. n
  succprob(d)     # Get the success rate, i.e. p
  failprob(d)     # Get the failure rate, i.e. 1 - p


  External links:

    •  Binomial distribution on Wikipedia (http://en.wikipedia.org/wiki/Binomial_distribution)

I am sure PRs to improve the documentation/make them more extensive would be welcome!

3 Likes