Iterators, collections, arrays

question

#1

Hello.

I’ve being reading a little bit some basic Julia tutorials and I have a doubt:

Could anyone explain the difference between Iterators, collections and arrays, please.
I’m coming from R and there aren’t iterators or collections there, I don’t understand why we need it.

For example in R you can just do

1:10

but in Julia you need to to

collect(1:1:10)

Or you need to use

collect(permutations(1:4))

instead of just

permutations(1:4)


#2

It’s not necessary to store every single number in order to iterate over a range of equally spaced numbers, so Julia takes advantage of this. Rather than create a temporary vector, Julia uses a data type that uses less memory and does the same thing.

julia> typeof(1:10)
UnitRange{Int64}

julia> typeof(collect(1:10))
Array{Int64,1}

julia> sizeof(1:100)
16

julia> sizeof(collect(1:100))
800

#3

My understanding is that iterators simply allow you to loop over objects. In R, you loop over an integer index. In Julia, it is also possible to loop over objects. For example, you can iterate over an array of arrays:

 data = [rand(2,2) for i in 1:10]
 for d in data
    println(d)
 end

Using iterators like this is certainly not necessary, but it can be convenient and easier to read. You can also iterate over the object and index with enumerate() or multiple objects concurrently with zip().

By the way, if you are simply initializing an array, you can use

[1:10;]

as a shorthand.


#4

What about

permutations(1:4)

And what other things can you do directly with the iterators without transforming them with colllect()?


#5

Many of the core mathematical operations still work with the unit range type. For example,

m = 1:4
m*m'


4×4 Array{Int64,2}:
 1  2   3   4
 2  4   6   8
 3  6   9  12
 4  8  12  16

So for certain operations, you do not need to use collect(). Multiple dispatch handles that for you. There might be other uses that I am not aware of.


#6

Just assume that you never need to collect. In some small number of cases you may want to anyway, but probably less than 1% of the time.


#7

Just to be clear, a range like 1:10 in Julia is not just an iterator (= any type you can loop over, i.e. any type with start, next, done, and usually eltype and length), it is a subtype of AbstractVector (and has all usual array methods like getindex and ndims), so you can mostly treat it as a drop-in replacement for a read-only array.


#8

That’s the key. Since it doesn’t allocate memory, usually it will work in circumstances where it’s read-only. 1:4 never makes an array, but A=1:4; A[1] still works. But since there is no array in memory to actually write to, A[1] = 4 fails without collecting to a real array. So if you pass it into algorithms which use the array but don’t write into it, 99% of the time you’re fine. The other 1% is someone too stictly typing their dispatches (i.e. a bug to report).


#9

It is possible that some of those are outdated — Julia evolves very rapidly. Read the manual.

Iteration (traversal of a collection) is implemented using generic functions in Julia. This means that for each type, you can specify how it is traversed. This has various advantages: some stuctures have a layout which favors a certain kind of traversal, and in some cases, the values can be generated very cheaply on demand, as for 1:10. This is a big advantage compared to R, where 1:10000 means that you actually allocate that vector.

In principle, every function that can expects an iterable object should be able to deal with types that implement the interface. collect is a workaround for when it is not the case: eg collect(1:10) converts to a vector [1,2,3,4,5,6,7,8,9,10]. As a user, design your code so that it works with all iterables (simply not restricting the type will be fine in most cases). If you encounter restrictive behavior in a library, report an issue.