A good way to iterate through sequences?

I need to iterate through all sequences of integers between 1 and A, and of length L. Here A,L are positive integers. For example, if A = 2, L = 2, the sequences are:

 [1, 1]
 [2, 1]
 [1, 2]
 [2, 2]

The iteration should be as fast as possible. I was thinking of generating the sequences before-hand, and store them in an L times A^L matrix. Then I can loop through the columns of this matrix (as views).

To test this idea, I created a script, myscript.jl with the following contents:

struct Sequences
    sequences::Array{Int, 2}
end

Base.start(S::Sequences) = 1
Base.done(S::Sequences, state) = state > size(S.sequences, 2)
Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1)
Base.eltype(::Type{Sequences}) = SubArray{Int64, 1, Array{Int64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}
Base.length(S::Sequences) = size(S.sequences, 2)

function sequences_generator(L::Integer, A::Integer)
    @assert L > 0
    @assert A > 1
    (digits.(r, A, L) .+ 1 for r = 0 : A^L - 1)
end

function Sequences(L::Integer, A::Integer)
    @assert L > 0 && A > 1
    sequences = hcat(collect(sequences_generator(L, A))...)
    Sequences(sequences)
end

function do_something(sequences)
    Z = 0.
    for sequence in sequences
        Z += rand()
    end
    return Z
end


const sequences = Sequences(3, 5)

do_something(sequences)   # pre-compile

Profile.clear_malloc_data()  # only want to measure allocations of next call, so clean slate.

do_something(sequences)

Then I called this script using julia --track-allocation=user myscript.jl, which generates a report file called myscript.jl.mem containing memory allocations per line.

Everything looks fine except for the line Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1), which is allocating a lot of memory. Any ideas of how I can reduce the allocation footprint here? Or perhaps a different approach I can try?

I solved a very similar problem recently; check out the commented TupleGenerator code here:

://github.com/dpsanders/LazyTaylorSeries.jl/blob/master/src/taylorN.jl

Note that to avoid allocations you should use tuples, not arrays.

Note that I would like to avoid having to “compute the next sequence” if possible. That’s why I wanted to pre-allocate them, maybe a as an Array of Tuples? But I am not sure if this is the best approach.

julia> tuples = Base.Iterators.product([1:3 for i in 1:4]...)
Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}}(1:3, Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}(1:3, Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}(1:3, 1:3)))

julia> vec(collect(tuples))
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 â‹®
 (3, 3, 2, 3)
 (1, 1, 3, 3)
 (2, 1, 3, 3)
 (3, 1, 3, 3)
 (1, 2, 3, 3)
 (2, 2, 3, 3)
 (3, 2, 3, 3)
 (1, 3, 3, 3)
 (2, 3, 3, 3)
 (3, 3, 3, 3)

Since you said you wanted to iterate through, I thought you wanted an iterator.
Generating the next tuple is very fast, and saves the memory needed for the array. It depends on your exact use case what would be best.

2 Likes

Why an array of tuples is better than a matrix?

Sounds like you want this: Multidimensional algorithms and iteration

2 Likes

Yes, as @stevengj pointed out, I’d say the ideal way is

julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, Val(L)))
myiterator (generic function with 1 method)

julia> vec([Tuple(c) for c in myiterator(3,4)])
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 (2, 1, 2, 1)
....

EDIT: note that if you are not in 0.7, you have a different form of the CartesianIndex interface, IIRC

1 Like

Just for completeness, some slight optimization can be achieved passing Val(L) instead of L so that the ntuple can be statically generated

julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, L))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([Tuple(c) for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3, Val(4));
 334.193 ns (5 allocations: 2.89 KiB)
1 Like

The code doesn’t work in v0.6.3. It doesn’t find CartesianIndices.

Yep, in 0.6 you need this form

julia> myiterator(A, L) = CartesianRange(CartesianIndex(ntuple(_ -> A, L)))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([c.I for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3,Val(4))
  336.326 ns (4 allocations: 2.81 KiB)
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 (2, 1, 2, 1)
 (3, 1, 2, 1)
...

If you really prefer a matrix, instead of a vector of tuples, you can also do it very efficiently, but with less concise code:

julia> function mymatrix(A, VL::Val{L}) where L
                  result = Matrix{Int}(L, A^L)
                  for (i, c) in enumerate(myiterator(A, VL))
                      result[:, i] .= c.I
                  end
                  return result
              end
mymatrix (generic function with 1 method)

julia> @btime mymatrix(3, Val(4))
  1.075 ÎĽs (1 allocation: 2.69 KiB)
4Ă—81 Array{Int64,2}:
 1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  …  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3
 1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3     2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3     1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1     3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3

Incidentally, improvements in the compiler allow the matrix version to be almost as fast as the tuple version in v0.7

EDIT: forgot the v0.6 definition of mytuples

1 Like

Thanks @pablosanjose!

But I still don’t understand why in my original code (first post in this thread) the line Base.next(S::Sequences, ... where I return a view into a column is allocating so much. I don’t understand what’s wrong with it??

You perhaps intended to do something different than what you did. You defined a method Base.next(::Sequence, state) that implements the iterator API for the Sequence type. However, you are actually never calling this function. When you do collect(sequences_generator(L, A)) the output of sequence_generator is not a Sequence but a Generator (that’s a different type). Hence, collect is never calling your next function, but the definition in Base for Generators:

julia> @which next(sequences_generator(3, 5), 1)
next(g::Base.Generator, s) in Base at generator.jl:4

It is that function that is allocating so much, not your next function.

I am measuring the allocations within the last call to do_something(sequences) in my code snippet. The pre-allocation of the sequences themselves is of course expected.
What troubles me is the allocations of the view.

Ah, right, sorry. But then you are not allocating so much. Don’t forget views do allocate, apparently. (I think I read some explanation from yuyichao somewhere). You are just seeing the cost of that allocation, right?

julia> @btime next($(Sequences(3,5)), 1)
  15.995 ns (2 allocations: 80 bytes)
([1, 1, 1], 2)

julia> a = rand(3,10); @btime (view($a, :, 1), 1);
  15.064 ns (2 allocations: 80 bytes)

Maybe you were expecting view to not allocate at all?

1 Like

Yes, I was. But kristoffer.carlsson just suggested that I could use UnsafeArrays.jl, which doesn’t allocate.

I can give a “layman’s” explanation for why view allocates. The reason is that view is a small struct in which one field is (a pointer to) the original array. As long as the view is alive, the original array cannot be deallocated. So the garbage collector needs to know all the views that exist for a given array. This is possible only if the view itself is heap allocated, since otherwise the run-time system cannot keep track of when the view is copied.

The compiler keeps getting better at catching use-cases in which it can prove that small structures like this are short-lived and not copied, and in this case the compiler can elide the allocation.

5 Likes

I ended up using this. It is working very well and fast. Thanks all @dpsanders @stevengj @pablosanjose @Stephen_Vavasis for your replies.

3 Likes

To be explicit, you asked if there was a package that provided unsafe views that where guaranteed not to allocate, and I linked that package. I didn’t say that using unsafe views was the best way to solve this particular problem.

1 Like

Yes, of course. Thanks

The only thing that bothers me with Base.Iterators.product([1:3 for i in 1:4]...) is that the generator generates the sequences in a 3x3x3x3 array. Is there a way to generate a flat array?

I know I can vec(collect(...)) it, but that also materializes the iterator, which I don’t want. vec applied on the generator directly doesn’t work. I also tried Iterators.flatten, but that flattens too much (even the tuples get flattened).

Update: I found my answer on the Slack:

collect(Iterators.flatten((Base.Iterators.product((1:3 for i in 1:4)...),)))
1 Like