Iterators.product is not type stable?

I have a problem using product. I need a Cartesian product iterator but when I try using product it seems not type-stable as the following shows (Julia 1.0)

julia> m=[1:2,3:5]
2-element Array{UnitRange{Int64},1}:
 1:2
 3:5

julia> f(m::Vector{UnitRange{Int}})=collect(Iterators.product(m...))
f (generic function with 1 method)

julia> @code_warntype f(m)
Body::Any
1 1 ─ %1 = Base.Iterators.product::Core.Compiler.Const(Base.Iterators.product, false)
  β”‚   %2 = (Core._apply)(%1, m)::Base.Iterators.ProductIterator{_1} where _1     β”‚
  β”‚   %3 = (Main.collect)(%2)::Any                                               β”‚
  └──      return %3

Apparently, Julia cannot infer the eltype of the result. Is there a way to get a type-stable iterator on a Cartesian product of iterators? I would be happy with a β€œflat” result (the vec of the above result).

1 Like

That’s because it does not now how many iterators it will be taking the product of if you input them via a vector. If possible, use a tuple of iterators as input argument

julia> f(m)=collect(Iterators.product(m...))
f (generic function with 1 method)

julia> @code_warntype f([1:2,3:5])
Body::Any
1 1 ─ %1 = (Core._apply)(Base.Iterators.product, m)::Any                    β”‚
  β”‚   %2 = (Main.collect)(%1)::Any                                          β”‚
  └──      return %2    

julia> @code_warntype f((1:2,3:5))
Body::Array{Tuple{Int64,Int64},2}
1 1 ── %1  = (getfield)(m, 1)::UnitRange{Int64} 
  β”‚    %2  = (getfield)(m, 2)::UnitRange{Int64} 
  β”‚    %3  = (Core.tuple)(%1, %2)::Tuple{UnitRange{Int64},UnitRange{Int64}}
  β”‚    %4  = %new(Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}}, %3)::Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}}
  β”‚          (Base.ifelse)(true, 1, 0)            Colon
  β”‚    %6  = (Base.getfield)(%1, :stop)::Int64     _similar_for
  β”‚    %7  = (Base.getfield)(%1, :start)::Int64     axes
  β”‚    %8  = (Base.Checked.checked_ssub_int)(%6, %7)::Tuple{Int64,Bool}
  β”‚    %9  = (Base.getfield)(%8, 1, true)::Int64      _prod_axes1
  β”‚    %10 = (Base.getfield)(%8, 2, true)::Bool        axes
  └───       goto #3 if not %10  β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β•»              size
  2 ──       invoke Base.Checked.throw_overflowerr_binaryop(:-::Symbol, %6::Int64, %7::Int64)
  └───       $(Expr(:unreachable))│││││││││┃              checked_sub
  3 ──       goto #4             β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    
  4 ── %15 = (Base.Checked.checked_sadd_int)(%9, 1)::Tuple{Int64,Bool}erflow
  β”‚    %16 = (Base.getfield)(%15, 1, true)::Int64           indexed_iterate
  β”‚    %17 = (Base.getfield)(%15, 2, true)::Bool            getindex
  └───       goto #6 if not %17  β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β•»              checked_add
  5 ──       invoke Base.Checked.throw_overflowerr_binaryop(:+::Symbol, %9::Int64, 1::Int64)
  └───       $(Expr(:unreachable))β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    
  6 ──       goto #7             β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    
  7 ──       goto #8             β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     
  8 ──       goto #9             β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚      
  9 ── %24 = (Base.slt_int)(%16, 0)::Boolβ”‚β•»β•·β•·β•·           Type
  β”‚    %25 = (Base.ifelse)(%24, 0, %16)::Int64            Type
  └───       goto #10            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚       
  10 ─       goto #11            β”‚β”‚β”‚β”‚β”‚β”‚β”‚        
  11 ─ %28 = (Base.getfield)(%2, :stop)::Int64         _prod_axes1
  β”‚    %29 = (Base.getfield)(%2, :start)::Int64         axes
  β”‚    %30 = (Base.Checked.checked_ssub_int)(%28, %29)::Tuple{Int64,Bool}
  β”‚    %31 = (Base.getfield)(%30, 1, true)::Int64         length
  β”‚    %32 = (Base.getfield)(%30, 2, true)::Bool           checked_sub
  └───       goto #13 if not %32 β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β•»              checked_sub
  12 ─       invoke Base.Checked.throw_overflowerr_binaryop(:-::Symbol, %28::Int64, %29::Int64)
  └───       $(Expr(:unreachable))β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚   
  13 ─       goto #14            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚   
  14 ─ %37 = (Base.Checked.checked_sadd_int)(%31, 1)::Tuple{Int64,Bool}erflow
  β”‚    %38 = (Base.getfield)(%37, 1, true)::Int64            indexed_iterate
  β”‚    %39 = (Base.getfield)(%37, 2, true)::Bool             getindex
  └───       goto #16 if not %39 β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β•»              checked_add
  15 ─       invoke Base.Checked.throw_overflowerr_binaryop(:+::Symbol, %31::Int64, 1::Int64)
  └───       $(Expr(:unreachable))β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚   
  16 ─       goto #17            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚   
  17 ─       goto #18            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    
  18 ─       goto #19            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     
  19 ─ %46 = (Base.slt_int)(%38, 0)::Boolβ”‚β”‚β•»β•·β•·β•·           Type
  β”‚    %47 = (Base.ifelse)(%46, 0, %38)::Int64             Type
  └───       goto #20            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚      
  20 ─       goto #21            β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚       
  21 ─       goto #22            β”‚β”‚β”‚β”‚β”‚β”‚β”‚        
  22 ─       goto #23            β”‚β”‚β”‚β”‚β”‚β”‚         
  23 ─       goto #24            β”‚β”‚β”‚β”‚β”‚          
  24 ─ %53 = $(Expr(:foreigncall, :(:jl_alloc_array_2d), Array{Tuple{Int64,Int64},2}, svec(Any, Int64, Int64), :(:ccall), 3, Array{Tuple{Int64,Int64},2}, :(%25), :(%47), :(%47), :(%25)))::Array{Tuple{Int64,Int64},2}
  └───       goto #25            β”‚β”‚β”‚β”‚           
  25 ─ %55 = invoke Base.copyto!(%53::Array{Tuple{Int64,Int64},2}, %4::Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}})::Array{Tuple{Int64,Int64},2}
  └───       goto #26            β”‚β”‚β”‚            
  26 ─       goto #27            β”‚β”‚             
  27 ─       return %55          β”‚              
1 Like

Well, I want to make the Cartesian product of a bunch of AbstractVector{T}, whose number is
not known in advance. The result is (obviously) an AbstractVector of Vector{T} , since I want it just as a list
and do not care about fancy dimensioning. If product is not suitable, do I need to code everything myself or is there a solution I can copy from somewhere? And if I need to code the object myself, what should be
a suitable name? FlatCartesian?

Iterators.product would return tuples, not vectors, so if I understand you correctly, you might want something like

julia> f(iterators...) = vec([collect(x) for x in Iterators.product(iterators...)])
f (generic function with 2 methods)

julia> listofvecs = [1:2, 3:5]
2-element Array{UnitRange{Int64},1}:
 1:2
 3:5

julia> f(listofvecs...)
6-element Array{Array{Int64,1},1}:
 [1, 3]
 [2, 3]
 [1, 4]
 [2, 4]
 [1, 5]
 [2, 5]

Yes, this is how I code currently what I want. But it is not type-stable! This is my problem…

Edit: I am confounded! Encapsulated as a function f as you did the code seems type-stable while
the same code inline in a bigger function is not…

Edit again: it is not type-stable in the following sense:

julia> cartesian(l...)=vec([collect(x) for x in Iterators.product(l...)])
cartesian (generic function with 1 method)

julia> f(n)=cartesian([1:i for i in 2:n]...)
f (generic function with 1 method)

julia> @code_warntype f(3)
Body::Any
1 1 ─ %1 = (Base.sle_int)(2, n)::Bool                      β”‚β•»β•·β•·β•·β•· Colon
  β”‚        (Base.sub_int)(n, 2)                            β”‚β”‚β•»     Type
  β”‚   %3 = (Base.ifelse)(%1, n, 1)::Int64                  │││┃     unitrange_last
  β”‚   %4 = %new(UnitRange{Int64}, 2, %3)::UnitRange{Int64} β”‚β”‚β”‚   
  β”‚   %5 = %new(Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##13#14"))}, getfield(Main, Symbol("##13#14"))(), %4)::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##13#14"))}
  β”‚   %6 = invoke Base.collect(%5::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##13#14"))})::Array{UnitRange{Int64},1}
  β”‚   %7 = (Core._apply)(Main.cartesian, %6)::Any          β”‚     
  └──      return %7

The problem here is that the type of ([1:i for i in 2:n]...) is not inferred at compile time, so your input f(3) is not enough to know the type of the argument to cartesian (the length of the tuple). f is intrinsically type-unstable (different arguments of the same type Int results in different output types). The type of f(n) is therefore only known at runtime. One way that you can remain inferable is to pass your 3 as Val(3), as in the following, but I suspect this is not the kind of thing you will want to do in your real code

julia> cartesian(l...)=vec([collect(x) for x in Iterators.product(l...)])
cartesian (generic function with 1 method)

julia> f(::Val{N}) where {N} = cartesian(ntuple(i->1:(i+1), Val(N-1))...)
f (generic function with 1 method)

julia> @code_warntype f(Val(3))
Body::Array{Array{Int64,1},1}
1 1 ─       (Base.ifelse)(true, 2, 0)                                                                                                β”‚β•»β•·β•·β•·β•·  ntuple
 ...

EDIT: correction - f is not type-unstable, actually, only the intermediate ([1:i for i in 2:n]...) is, but this is enough to stop inferability. The compiler can only infer the output type of a function such as cartesian if it knows the type of its arguments to start with, I guess.

One more note. In the v0.7 world it seems type stability is not the final word on performance. Check this out:

julia> cartesian(l...)=vec([collect(x) for x in Iterators.product(l...)])
julia> f1(::Val{N}) where {N} = cartesian(ntuple(i->1:(i+1), Val(N-1))...);
julia> f2(n)=cartesian(ntuple(i->1:i+1, n-1)...);

Of these two functions f1 is type-stable, but f2 is not. However, they are both equally fast, and about x4 faster than your f above

julia> @btime f1(Val(3));
  229.989 ns (9 allocations: 784 bytes)
julia> @btime f2(3);
  229.444 ns (9 allocations: 784 bytes)

I am still getting used to the performance landscape of v1.0. It’s got so many tricks up its sleeve that I confess I still find it a little difficult to anticipate what will be fast and what will not…

1 Like

I wrote the following

struct Cartesian{T,U<:AbstractVector{T}}
  d::Vector{U}
end

function Base.iterate(C::Cartesian,cnt::Vector{Int}=begin
    v=map(length,C.d)
    if !(0 in v)
      v=fill(1,length(C.d))
      v[end]=0
    end
    v
  end)
  for i in length(cnt):-1:1
    if cnt[i]<length(C.d[i])
      cnt[i]+=1
      for j in i+1:length(cnt)
        cnt[j]=1
      end
      return map((a,b)->a[b],C.d,cnt),cnt
    end
  end
  return nothing
end

Base.length(C::Cartesian)=prod(length,C.d)

Base.eltype(::Type{Cartesian{T,U}}) where T where U=Vector{T}

but the timing of

g(n)=collect(Cartesian( [1:i for i in 2:n]))

is not better than of my function f above, even though this time g(3) is type-stable.
Is my code suboptimal in any way?

Maybe this?

struct Cartesian{N,T,U<:AbstractVector{T}}
  d::NTuple{N,U}
end

function Base.iterate(C::Cartesian{N}) where {N}
    v0 = length.(C.d)
    if (0 in v0)
      cnt = Int[]
    else 
      cnt = fill(1, N)
      cnt[end] = 0
    end
    Base.iterate(C, cnt)
end

function Base.iterate(C::Cartesian{N}, cnt::Vector{Int}) where {N}
  for i in N:-1:1
    if cnt[i] < length(C.d[i])
      cnt[i]+=1
      for j in i+1:N
        cnt[j]=1
      end
      return map((a,b)->a[b], C.d, cnt), cnt
    end
  end
  return nothing
end

Base.length(C::Cartesian) = prod(length,C.d)

Base.eltype(::Type{Cartesian{N,T}}) where {N,T} = Vector{T}

g(n)=collect(Cartesian(ntuple(i -> 1:i+1, n-1)))
julia> @btime g(3);
  378.480 ns (26 allocations: 1.34 KiB)

If you really need the result as a vector of vectors, this looks pretty much optimal to me, performance-wise, unless you want to use the built-in version I posted above…

An alternative, highly performant, and very elegant approach would probably be to use CartesianIndices

julia> g(n) = CartesianIndices(ntuple(i->1:i+1, n-1))
g (generic function with 1 method)

julia> @btime g(3)
  1.347 ns (0 allocations: 0 bytes)
2Γ—3 CartesianIndices{2,Tuple{UnitRange{Int64},UnitRange{Int64}}}:
 CartesianIndex(1, 1)  CartesianIndex(1, 2)  CartesianIndex(1, 3)
 CartesianIndex(2, 1)  CartesianIndex(2, 2)  CartesianIndex(2, 3)

I would say this is the truly Julian way to do this. What you obtain is not an array, but you can use it as such

julia> [[i.I...] for i in g(3)]
2Γ—3 Array{Array{Int64,1},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

EDIT: Note that the lazy CartesianIndices approach could be done also with Iterators.product as in the post above. That is, you could avoid collect until you really need it, and build a generator of tuples instead

julia> g(n) = Iterators.product(ntuple(i->1:i+1, n-1)...)
g (generic function with 1 method)

julia> @btime collect(g(3))
  37.055 ns (1 allocation: 176 bytes)
2Γ—3 Array{Tuple{Int64,Int64},2}:
 (1, 1)  (1, 2)  (1, 3)
 (2, 1)  (2, 2)  (2, 3)

Just one more comment, if I may. You might have realised that the main performance issue with the struct Cartesian approach you are following is that you want to end up allocating a Vector of Vectors, and each of these vectors has itself to be allocated on the heap. If you want performance you want to avoid all those allocations before anything else. The post above shows a bunch of ways to do that by exploiting the fact that you can know at compile time the length of all these vectors (n, or N, i.e. the number of ranges you provide), and thus build a Vector of NTuple{N,Int} or CartesianIndex instead of a Vector{Vector{Int}}. The NTuples and CartesianIndexes do not allocate, so that’s fast. Depending on what you need to do you could even pull it off with lazy generators, as pointed out.

If in your real code you absolutely want to have a Vector of AbstractVectors as output, because, say, you need to do some linear algebra on them later or whatever, what you want to do is return a Vector{SVector{N,Int}}, where SVector is from StaticArrays, and has a static length N, known at compile time. These SVectors are very similar to Tuples, but you can do all sort of linear algebra on them (check MVector too if you need mutation). StaticArrays is possibly one of the most important packages out there that almost everyone should use for performance (and that will probably become part of Base in some form eventually, according to some comments by Keno)

Then, for a Vector{<:AbstractVector} output, I would do this

julia> using StaticArrays
julia> g(n) = [SVector(i) for i in Iterators.product(ntuple(i->1:i+1, n-1)...)]
julia> @btime g(3)
  41.483 ns (2 allocations: 224 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

With this you can essentially do all you would be expecting to do with a Vector{Vector{Int}}, except mutation.

1 Like

You are right that writing 2 methods for iterate is clearer (and I found a bit faster) than putting everything in one.
However you still seem to miss the fundamental fact of my situation: I do not (and cannot) know at compile time how many AbstractVectors I want to take the Cartesian product of. I would love an approach which would:

  • work in the general case
  • be as efficient as possible in the particular case where I do know how many vectors I multiply
    Actually I think Iterators.product is very good when I know how many iterators I multiply.

Here is my fixed code writing 2 methods for iterate.

function Base.iterate(C::Cartesian)
  if any(isempty,C.d) return nothing end
  return map(a->a[1],C.d),fill(1,length(C.d))
end

function Base.iterate(C::Cartesian,cnt::Vector{Int})
  for i in length(cnt):-1:1
    if cnt[i]<length(C.d[i])
      cnt[i]+=1
      for j in i+1:length(cnt)
        cnt[j]=1
      end
      return map((a,b)->a[b],C.d,cnt),cnt
    end
  end
  return nothing
end

And by the way: CartesianIndices does not seem faster than the above code

Ah, ok, I had misunderstood. Ok, n is completely unknown. We could model that changing n by e.g. rand(n:n). Then the compiler has no way of knowing what n is at compile time. Even in that case I see a CartesianIndices implementeation outperforming your code by quite a margin (not so with Iterators.product, which requires splatting):

julia> g(n) = [SVector(i.I) for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))]
julia> @btime g(3)
  175.413 ns (4 allocations: 320 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

(I think the main reason is that the way you build your combinations exploits slicing such as vec1[vec2], which is always allocating, necessarily.)

I cannot confirm your timings on my computer:

julia> g(n) = [SVector(i.I) for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))]
g (generic function with 1 method)

julia> @btime g(3)
  776.602 ns (19 allocations: 832 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

julia> f(n)=collect(Cartesian( [1:i for i in 2:rand(n:n)]))
f (generic function with 1 method)

julia> @btime f(3)
  403.361 ns (27 allocations: 1.34 KiB)
6-element Array{Array{Int64,1},1}:
 [1, 1]
 [1, 2]
 [1, 3]
 [2, 1]
 [2, 2]
 [2, 3]

Mm, funny! Are you on v0.7/1.0?
(EDIT: yes, of course you are, you are using CartesianIndices…)

How interesting… Could you please run the following in a fresh 0.7/1.0 REPL?

julia> using StaticArrays, BenchmarkTools
julia> g(n) = [i.I for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))]
julia> @btime g(3)
julia> g(n) = [SVector(i.I) for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))]
julia> @btime g(3)

Yes, funny! In a fresh 1.0 REPL, I get about 155ns for both. The measurements are very much smaller
in a fresh REPL! While my solution with Cartesian is still 393ns

Uh, I think this could be a Julia bug:

julia> using BenchmarkTools
julia> g(n) = [SVector(i.I) for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))];
g (generic function with 1 method)
julia> g(3)
ERROR: UndefVarError: SVector not defined
Stacktrace:
 [1] (::getfield(Main, Symbol("##3#5")))(::CartesianIndex{2}) at ./none:0
 [2] collect(::Base.Generator{CartesianIndices{2,Tuple{UnitRange{Int64},UnitRange{Int64}}},getfield(Main, Symbol("##3#5"))}) at ./generator.jl:47
 [3] g(::Int64) at ./REPL[2]:1
 [4] top-level scope at none:0
julia> using StaticArrays
julia> @btime g(3)
  770.817 ns (19 allocations: 832 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

while

julia> using BenchmarkTools
julia> g(n) = [SVector(i.I) for i in CartesianIndices(ntuple(i->1:i+1, rand(n:n)-1))];
julia> using StaticArrays
julia> @btime g(3)
  164.820 ns (4 allocations: 320 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]

That is: running the function that uses an undefined type errors, but if we then import that type and run again, the function is slower that if the error is not triggered.

This seems unrelated to your question, so I’ll post it in a separate thread. In any case do let us know if the CartesianIndices approach is useful for your real code.

EDIT: Yes, I think I’ll just post it as an issue in Github directly.

Yes, this is exactly what I did. You should file an issue!

As for CartesianIndices, I still need to find how to extract efficiently a Vector or Tuple from my list of iterators
using one of them.

https://github.com/JuliaLang/julia/issues/29025

Ok. I’m not completely sure what you mean by that, but if the remaining issue is that you need your input to g to be of the form [1:i for i in 2:n] or similar, note that you can always do something like

julia> f(n) = [1:i for i in 2:n]
julia> g(ranges) = [SVector(i.I) for i in CartesianIndices(ntuple(i->ranges[i], length(ranges)))]
julia> @btime g($(f(3)))
  179.242 ns (5 allocations: 336 bytes)
2Γ—3 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 1]  [1, 2]  [1, 3]
 [2, 1]  [2, 2]  [2, 3]