A good way to iterate through sequences?

e3c6 · June 16, 2018, 3:51pm

I need to iterate through all sequences of integers between 1 and A, and of length L. Here A,L are positive integers. For example, if A = 2, L = 2, the sequences are:

 [1, 1]
 [2, 1]
 [1, 2]
 [2, 2]

The iteration should be as fast as possible. I was thinking of generating the sequences before-hand, and store them in an L times A^L matrix. Then I can loop through the columns of this matrix (as views).

To test this idea, I created a script, myscript.jl with the following contents:

struct Sequences
    sequences::Array{Int, 2}
end

Base.start(S::Sequences) = 1
Base.done(S::Sequences, state) = state > size(S.sequences, 2)
Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1)
Base.eltype(::Type{Sequences}) = SubArray{Int64, 1, Array{Int64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}
Base.length(S::Sequences) = size(S.sequences, 2)

function sequences_generator(L::Integer, A::Integer)
    @assert L > 0
    @assert A > 1
    (digits.(r, A, L) .+ 1 for r = 0 : A^L - 1)
end

function Sequences(L::Integer, A::Integer)
    @assert L > 0 && A > 1
    sequences = hcat(collect(sequences_generator(L, A))...)
    Sequences(sequences)
end

function do_something(sequences)
    Z = 0.
    for sequence in sequences
        Z += rand()
    end
    return Z
end


const sequences = Sequences(3, 5)

do_something(sequences)   # pre-compile

Profile.clear_malloc_data()  # only want to measure allocations of next call, so clean slate.

do_something(sequences)

Then I called this script using julia --track-allocation=user myscript.jl, which generates a report file called myscript.jl.mem containing memory allocations per line.

Everything looks fine except for the line Base.next(S::Sequences, state) = (view(S.sequences, :, state), state + 1), which is allocating a lot of memory. Any ideas of how I can reduce the allocation footprint here? Or perhaps a different approach I can try?

dpsanders · June 16, 2018, 5:00pm

I solved a very similar problem recently; check out the commented TupleGenerator code here:

://github.com/dpsanders/LazyTaylorSeries.jl/blob/master/src/taylorN.jl

Note that to avoid allocations you should use tuples, not arrays.

e3c6 · June 16, 2018, 5:14pm

Note that I would like to avoid having to “compute the next sequence” if possible. That’s why I wanted to pre-allocate them, maybe a as an Array of Tuples? But I am not sure if this is the best approach.

dpsanders · June 16, 2018, 5:30pm

julia> tuples = Base.Iterators.product([1:3 for i in 1:4]...)
Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}}(1:3, Base.Iterators.Prod{UnitRange{Int64},Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}}(1:3, Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}}(1:3, 1:3)))

julia> vec(collect(tuples))
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 ⋮
 (3, 3, 2, 3)
 (1, 1, 3, 3)
 (2, 1, 3, 3)
 (3, 1, 3, 3)
 (1, 2, 3, 3)
 (2, 2, 3, 3)
 (3, 2, 3, 3)
 (1, 3, 3, 3)
 (2, 3, 3, 3)
 (3, 3, 3, 3)

Since you said you wanted to iterate through, I thought you wanted an iterator.
Generating the next tuple is very fast, and saves the memory needed for the array. It depends on your exact use case what would be best.

e3c6 · June 16, 2018, 6:12pm

Why an array of tuples is better than a matrix?

stevengj · June 16, 2018, 6:13pm

Sounds like you want this: Multidimensional algorithms and iteration

pablosanjose · June 16, 2018, 6:18pm

Yes, as @stevengj pointed out, I’d say the ideal way is

julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, Val(L)))
myiterator (generic function with 1 method)

julia> vec([Tuple(c) for c in myiterator(3,4)])
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 (2, 1, 2, 1)
....

EDIT: note that if you are not in 0.7, you have a different form of the CartesianIndex interface, IIRC

pablosanjose · June 16, 2018, 6:31pm

Just for completeness, some slight optimization can be achieved passing Val(L) instead of L so that the ntuple can be statically generated

julia> myiterator(A,L) = CartesianIndices(ntuple(_ -> 1:A, L))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([Tuple(c) for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3, Val(4));
 334.193 ns (5 allocations: 2.89 KiB)

e3c6 · June 16, 2018, 6:52pm

The code doesn’t work in v0.6.3. It doesn’t find CartesianIndices.

pablosanjose · June 16, 2018, 7:02pm

Yep, in 0.6 you need this form

julia> myiterator(A, L) = CartesianRange(CartesianIndex(ntuple(_ -> A, L)))
myiterator (generic function with 1 method)

julia> mytuples(A, L) = vec([c.I for c in myiterator(A, L)])
mytuples (generic function with 1 method)

julia> @btime mytuples(3,Val(4))
  336.326 ns (4 allocations: 2.81 KiB)
81-element Array{NTuple{4,Int64},1}:
 (1, 1, 1, 1)
 (2, 1, 1, 1)
 (3, 1, 1, 1)
 (1, 2, 1, 1)
 (2, 2, 1, 1)
 (3, 2, 1, 1)
 (1, 3, 1, 1)
 (2, 3, 1, 1)
 (3, 3, 1, 1)
 (1, 1, 2, 1)
 (2, 1, 2, 1)
 (3, 1, 2, 1)
...

If you really prefer a matrix, instead of a vector of tuples, you can also do it very efficiently, but with less concise code:

julia> function mymatrix(A, VL::Val{L}) where L
                  result = Matrix{Int}(L, A^L)
                  for (i, c) in enumerate(myiterator(A, VL))
                      result[:, i] .= c.I
                  end
                  return result
              end
mymatrix (generic function with 1 method)

julia> @btime mymatrix(3, Val(4))
  1.075 μs (1 allocation: 2.69 KiB)
4×81 Array{Int64,2}:
 1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  …  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3  1  2  3
 1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3     2  2  2  3  3  3  1  1  1  2  2  2  3  3  3  1  1  1  2  2  2  3  3  3
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3     1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1     3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3

Incidentally, improvements in the compiler allow the matrix version to be almost as fast as the tuple version in v0.7

EDIT: forgot the v0.6 definition of mytuples

e3c6 · June 16, 2018, 8:00pm

Thanks @pablosanjose!

But I still don’t understand why in my original code (first post in this thread) the line Base.next(S::Sequences, ... where I return a view into a column is allocating so much. I don’t understand what’s wrong with it??

pablosanjose · June 16, 2018, 9:09pm

You perhaps intended to do something different than what you did. You defined a method Base.next(::Sequence, state) that implements the iterator API for the Sequence type. However, you are actually never calling this function. When you do collect(sequences_generator(L, A)) the output of sequence_generator is not a Sequence but a Generator (that’s a different type). Hence, collect is never calling your next function, but the definition in Base for Generators:

julia> @which next(sequences_generator(3, 5), 1)
next(g::Base.Generator, s) in Base at generator.jl:4

It is that function that is allocating so much, not your next function.

e3c6 · June 16, 2018, 9:26pm

I am measuring the allocations within the last call to do_something(sequences) in my code snippet. The pre-allocation of the sequences themselves is of course expected.
What troubles me is the allocations of the view.

pablosanjose · June 16, 2018, 9:32pm

Ah, right, sorry. But then you are not allocating so much. Don’t forget views do allocate, apparently. (I think I read some explanation from yuyichao somewhere). You are just seeing the cost of that allocation, right?

julia> @btime next($(Sequences(3,5)), 1)
  15.995 ns (2 allocations: 80 bytes)
([1, 1, 1], 2)

julia> a = rand(3,10); @btime (view($a, :, 1), 1);
  15.064 ns (2 allocations: 80 bytes)

Maybe you were expecting view to not allocate at all?

e3c6 · June 16, 2018, 10:03pm

Yes, I was. But kristoffer.carlsson just suggested that I could use UnsafeArrays.jl, which doesn’t allocate.

Stephen_Vavasis · June 17, 2018, 12:44am

I can give a “layman’s” explanation for why view allocates. The reason is that view is a small struct in which one field is (a pointer to) the original array. As long as the view is alive, the original array cannot be deallocated. So the garbage collector needs to know all the views that exist for a given array. This is possible only if the view itself is heap allocated, since otherwise the run-time system cannot keep track of when the view is copied.

The compiler keeps getting better at catching use-cases in which it can prove that small structures like this are short-lived and not copied, and in this case the compiler can elide the allocation.

e3c6 · June 17, 2018, 1:51pm

I ended up using this. It is working very well and fast. Thanks all @dpsanders @stevengj @pablosanjose @Stephen_Vavasis for your replies.

kristoffer.carlsson · June 18, 2018, 8:28am

To be explicit, you asked if there was a package that provided unsafe views that where guaranteed not to allocate, and I linked that package. I didn’t say that using unsafe views was the best way to solve this particular problem.

e3c6 · June 18, 2018, 9:21am

Yes, of course. Thanks

e3c6 · June 26, 2018, 6:02pm

The only thing that bothers me with Base.Iterators.product([1:3 for i in 1:4]...) is that the generator generates the sequences in a 3x3x3x3 array. Is there a way to generate a flat array?

I know I can vec(collect(...)) it, but that also materializes the iterator, which I don’t want. vec applied on the generator directly doesn’t work. I also tried Iterators.flatten, but that flattens too much (even the tuples get flattened).

Update: I found my answer on the Slack:

collect(Iterators.flatten((Base.Iterators.product((1:3 for i in 1:4)...),)))

Topic		Replies	Views
Generic n-dimensional loop without allocations General Usage question	15	987	December 2, 2016
How to quickly create array of array? General Usage question	12	1028	February 20, 2018
Concatenating iterables without allocating memory Performance	34	7312	January 16, 2020
Common allocation mistakes Performance memory-allocation	47	7116	August 21, 2023
Work around splatting to avoid unecessary allocations Performance	2	2092	January 16, 2019

A good way to iterate through sequences?

Related topics