# A good way to iterate through sequences?

I need to iterate through all sequences of integers between 1 and `A`, and of length `L`. Here `A,L` are positive integers. For example, if `A = 2, L = 2`, the sequences are:

`````` [1, 1]
[2, 1]
[1, 2]
[2, 2]
``````

The iteration should be as fast as possible. I was thinking of generating the sequences before-hand, and store them in an L times A^L matrix. Then I can loop through the columns of this matrix (as views).

To test this idea, I created a script, `myscript.jl` with the following contents:

``````struct Sequences
sequences::Array{Int, 2}
end

Base.start(S::Sequences) = 1
Base.done(S::Sequences, state) = state > size(S.sequences, 2)
Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1)
Base.eltype(::Type{Sequences}) = SubArray{Int64, 1, Array{Int64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}
Base.length(S::Sequences) = size(S.sequences, 2)

function sequences_generator(L::Integer, A::Integer)
@assert L > 0
@assert A > 1
(digits.(r, A, L) .+ 1 for r = 0 : A^L - 1)
end

function Sequences(L::Integer, A::Integer)
@assert L > 0 && A > 1
sequences = hcat(collect(sequences_generator(L, A))...)
Sequences(sequences)
end

function do_something(sequences)
Z = 0.
for sequence in sequences
Z += rand()
end
return Z
end

const sequences = Sequences(3, 5)

do_something(sequences)   # pre-compile

Profile.clear_malloc_data()  # only want to measure allocations of next call, so clean slate.

do_something(sequences)
``````

Then I called this script using `julia --track-allocation=user myscript.jl`, which generates a report file called `myscript.jl.mem` containing memory allocations per line.

Everything looks fine except for the line `Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1)`, which is allocating a lot of memory. Any ideas of how I can reduce the allocation footprint here? Or perhaps a different approach I can try?

I solved a very similar problem recently; check out the commented TupleGenerator code here:

://github.com/dpsanders/LazyTaylorSeries.jl/blob/master/src/taylorN.jl

Note that to avoid allocations you should use tuples, not arrays.

Note that I would like to avoid having to â€ścompute the next sequenceâ€ť if possible. Thatâ€™s why I wanted to pre-allocate them, maybe a as an Array of Tuples? But I am not sure if this is the best approach.

``````julia> tuples = Base.Iterators.product([1:3 for i in 1:4]...)
Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}}(1:3, Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}(1:3, Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}(1:3, 1:3)))

julia> vec(collect(tuples))
81-element Array{NTuple{4,Int64},1}:
(1, 1, 1, 1)
(2, 1, 1, 1)
(3, 1, 1, 1)
(1, 2, 1, 1)
(2, 2, 1, 1)
(3, 2, 1, 1)
(1, 3, 1, 1)
(2, 3, 1, 1)
(3, 3, 1, 1)
(1, 1, 2, 1)
â‹®
(3, 3, 2, 3)
(1, 1, 3, 3)
(2, 1, 3, 3)
(3, 1, 3, 3)
(1, 2, 3, 3)
(2, 2, 3, 3)
(3, 2, 3, 3)
(1, 3, 3, 3)
(2, 3, 3, 3)
(3, 3, 3, 3)
``````

Since you said you wanted to iterate through, I thought you wanted an iterator.
Generating the next tuple is very fast, and saves the memory needed for the array. It depends on your exact use case what would be best.

2 Likes

Why an array of tuples is better than a matrix?

Sounds like you want this: https://julialang.org/blog/2016/02/iteration

2 Likes

Yes, as @stevengj pointed out, Iâ€™d say the ideal way is

``````julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, Val(L)))
myiterator (generic function with 1 method)

julia> vec([Tuple(c) for c in myiterator(3,4)])
81-element Array{NTuple{4,Int64},1}:
(1, 1, 1, 1)
(2, 1, 1, 1)
(3, 1, 1, 1)
(1, 2, 1, 1)
(2, 2, 1, 1)
(3, 2, 1, 1)
(1, 3, 1, 1)
(2, 3, 1, 1)
(3, 3, 1, 1)
(1, 1, 2, 1)
(2, 1, 2, 1)
....
``````

EDIT: note that if you are not in 0.7, you have a different form of the `CartesianIndex` interface, IIRC

1 Like

Just for completeness, some slight optimization can be achieved passing `Val(L)` instead of `L` so that the `ntuple` can be statically generated

``````julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, L))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([Tuple(c) for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3, Val(4));
334.193 ns (5 allocations: 2.89 KiB)
``````
1 Like

The code doesnâ€™t work in v0.6.3. It doesnâ€™t find `CartesianIndices`.

Yep, in 0.6 you need this form

``````julia> myiterator(A, L) = CartesianRange(CartesianIndex(ntuple(_ -> A, L)))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([c.I for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3,Val(4))
336.326 ns (4 allocations: 2.81 KiB)
81-element Array{NTuple{4,Int64},1}:
(1, 1, 1, 1)
(2, 1, 1, 1)
(3, 1, 1, 1)
(1, 2, 1, 1)
(2, 2, 1, 1)
(3, 2, 1, 1)
(1, 3, 1, 1)
(2, 3, 1, 1)
(3, 3, 1, 1)
(1, 1, 2, 1)
(2, 1, 2, 1)
(3, 1, 2, 1)
...
``````

If you really prefer a matrix, instead of a vector of tuples, you can also do it very efficiently, but with less concise code:

``````julia> function mymatrix(A, VL::Val{L}) where L
result = Matrix{Int}(L, A^L)
for (i, c) in enumerate(myiterator(A, VL))
result[:, i] .= c.I
end
return result
end
mymatrix (generic function with 1 method)

julia> @btime mymatrix(3, Val(4))
1.075 ÎĽs (1 allocation: 2.69 KiB)
4Ă—81 Array{Int64,2}:
1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  â€¦  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3
1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3     2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3
1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3     1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3
1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1     3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
``````

Incidentally, improvements in the compiler allow the matrix version to be almost as fast as the tuple version in v0.7

EDIT: forgot the v0.6 definition of `mytuples`

1 Like

Thanks @pablosanjose!

But I still donâ€™t understand why in my original code (first post in this thread) the line `Base.next(S::Sequences, ...` where I return a view into a column is allocating so much. I donâ€™t understand whatâ€™s wrong with it??

You perhaps intended to do something different than what you did. You defined a method `Base.next(::Sequence, state)` that implements the iterator API for the `Sequence` type. However, you are actually never calling this function. When you do `collect(sequences_generator(L, A))` the output of `sequence_generator` is not a `Sequence` but a `Generator` (thatâ€™s a different type). Hence, collect is never calling your `next` function, but the definition in `Base` for `Generator`s:

``````julia> @which next(sequences_generator(3, 5), 1)
next(g::Base.Generator, s) in Base at generator.jl:4
``````

It is that function that is allocating so much, not your `next` function.

I am measuring the allocations within the last call to `do_something(sequences)` in my code snippet. The pre-allocation of the sequences themselves is of course expected.
What troubles me is the allocations of the `view`.

Ah, right, sorry. But then you are not allocating so much. Donâ€™t forget views do allocate, apparently. (I think I read some explanation from yuyichao somewhere). You are just seeing the cost of that allocation, right?

``````julia> @btime next(\$(Sequences(3,5)), 1)
15.995 ns (2 allocations: 80 bytes)
([1, 1, 1], 2)

julia> a = rand(3,10); @btime (view(\$a, :, 1), 1);
15.064 ns (2 allocations: 80 bytes)
``````

Maybe you were expecting `view` to not allocate at all?

1 Like

Yes, I was. But kristoffer.carlsson just suggested that I could use UnsafeArrays.jl, which doesnâ€™t allocate.

I can give a â€ślaymanâ€™sâ€ť explanation for why `view` allocates. The reason is that `view` is a small struct in which one field is (a pointer to) the original array. As long as the view is alive, the original array cannot be deallocated. So the garbage collector needs to know all the views that exist for a given array. This is possible only if the view itself is heap allocated, since otherwise the run-time system cannot keep track of when the view is copied.

The compiler keeps getting better at catching use-cases in which it can prove that small structures like this are short-lived and not copied, and in this case the compiler can elide the allocation.

5 Likes

I ended up using this. It is working very well and fast. Thanks all @dpsanders @stevengj @pablosanjose @Stephen_Vavasis for your replies.

3 Likes

To be explicit, you asked if there was a package that provided unsafe views that where guaranteed not to allocate, and I linked that package. I didnâ€™t say that using unsafe views was the best way to solve this particular problem.

1 Like

Yes, of course. Thanks

The only thing that bothers me with `Base.Iterators.product([1:3 for i in 1:4]...)` is that the generator generates the sequences in a 3x3x3x3 array. Is there a way to generate a flat array?

I know I can `vec(collect(...))` it, but that also materializes the iterator, which I donâ€™t want. `vec` applied on the generator directly doesnâ€™t work. I also tried `Iterators.flatten`, but that flattens too much (even the tuples get flattened).

Update: I found my answer on the Slack:

``collect(Iterators.flatten((Base.Iterators.product((1:3 for i in 1:4)...),)))``
1 Like