Cleanest way to generate all combinations of n arrays

What I am trying to get is all the possible values given an array [0, 1, 2] and a number n. The result for n=3 would be:

[0,0,0]
[0,0,1],
[0,0,2]
[0,1,0]
...
[2,2,2]

So basically all the numbers in base 3 given as separate elements in an array.

I have been trying to get this output by using Base.Iterators.product([0,1,2],[0,1,2],[0,1,2])and this does indeed work but I’m having trouble finding a way to be able to do so programatically, as I can’t pass the arguments to product separately.

I tried generating the list of arrays by using repeat([0,1,2], 1, n) and this does give me a matrix of n columns with each being one [0, 1, 2] arrays, but I can’t find a way to pass each of these columns separately as input to product()

3 Likes

splatting?

1 Like

Let me get you an answer.

I tried this but splatting a 3x3 array does not give you 3 1x3 arrays (and product expects n iterables as argument.

julia> x = repeat([1,2,3],1,3)
3×3 Array{Int64,2}:
 1  1  1
 2  2  2
 3  3  3

julia> collect(Base.Iterators.product(x...))
0-dimensional Array{NTuple{9,Int64},0}:
(1, 2, 3, 1, 2, 3, 1, 2, 3)

julia>

Probably not efficient but one way to do so is built the representation of this ternary number.

So you should do a loop up to 3 ^ n - 1 and for each number build its representation per index.

N = 3
lpad.(string.(0:N^N-1;base=N),N,'0').|>s->Int.(collect(s).-48)

Not the most efficient :slight_smile:

This is a naive Combinations iterator, doing something similar to what bennedich proposed.

import Base.iterate,Base.length
struct Combinations{T}
    itr::Vector{T}
    count::Int64
    itrsize::Int64
    function Combinations(itr::Vector{T},count::Int) where T
        new{T}(itr,Int64(count),length(itr))
    end
end

function iterate(c::Combinations,state::Int64=0)
    if state>=length(c)
        return nothing
    end
    indices=digits(state,base=c.itrsize,pad=c.count)
    [c.itr[i] for i in (indices .+1)],state+1
end

function length(c::Combinations)
    length(c.itr) ^ c.count
end

julia> collect(Combinations([0,1,2],3))
27-element Array{Any,1}:
 [0, 0, 0]
 [1, 0, 0]
 [2, 0, 0]
 [0, 1, 0]
 [1, 1, 0]
 [2, 1, 0]
 [0, 2, 0]
 [1, 2, 0]
 [2, 2, 0]
 [0, 0, 1]
 [1, 0, 1]
 [2, 0, 1]
 [0, 1, 1]
 [1, 1, 1]
 [2, 1, 1]
 [0, 2, 1]
 [1, 2, 1]
 [2, 2, 1]
 [0, 0, 2]
 [1, 0, 2]
 [2, 0, 2]
 [0, 1, 2]
 [1, 1, 2]
 [2, 1, 2]
 [0, 2, 2]
 [1, 2, 2]
 [2, 2, 2]

Here’s how to do it with Iterators.product:

N = 3
reverse.(Iterators.product(fill(0:N-1,N)...))[:]
4 Likes

Recursive lambda comprehension:

N = 3
(p=(a,c)->c<2 ? a : [[x;y] for x=a for y=p(a,c-1)])(0:N-1,N)
1 Like

Here’s a way using digits:

fun(arr::Vector, num::Int) = 
    [ getindex.(Ref(arr), 1 .+ digits(i-1; base=length(arr), pad=num)) for i=1:length(arr)^num ]

fun(["oh","1","2"],3)

Briefer:

N = 3
reverse.(digits.(0:N^N-1,base=N,pad=N))
1 Like

You can also try:

Iterators.product(ntuple(i->0:N-1, N)...)
3 Likes

See: Julia docs FAQ - The two uses of the ... operator - slurping and splatting

2 Likes

Does anybody know an optimized solution for this problem ?
I need to write a code that generates all of the possible patterns in a set of categorical data and it turns out that generating the combination is really the step that makes the overall operation incredibly slow (takes about 25 minutes tand then ends up into a memory error when I do it with real data).
I have tried using for loops :

function withloops(x,y)
    leny=length(y)
    lenx=length(x)
    m=leny*lenx
    OUT = zeros(Float64, m,2)
    c=1
    for i = 1:lenx
        for j = 1:leny
            OUT[c,1] = x[i]
            OUT[c,2] = y[j]
            c+=1
        end
    end
    return OUT
end

I also tried what @bennedich and @improbable22 suggested :

reverse.(Iterators.product(fill(0:N-1,N)...))[:] 

and

[ getindex.(Ref(arr), 1 .+ digits(i-1; base=length(arr), pad=num)) for i=1:length(arr)^num ]

turns out solution using Iterators.product is the most efficient but it still is too much for my computer.
Is there a better way to do it ?

You realize this is an exponential operation in terms of both time and memory, right? It is O(K^N) where K is the base and N is the length of the number. Moreover, by asking to materialize all of these digit combinations all at once, it forces the problem to not only have exponential run time but also to allocate exponential memory. This brute force approach is only going to work for tiny toy problems.

You’d be much better off not materializing all the arrays at once. Even better still would be not materializing then at all and allowing them to be represented implicitly as integers: you can then iterate through them very quickly and ask properties of them as integers. If you want something really fast and scalable you need to avoid considering most of the combinations at all.

Hard to give more advice without knowing what you’re trying to compute.

3 Likes

@StefanKarpinski Yes you are right … but I have no idea how to do this with integers.
What I am trying to do is the following : given N categories, find out all the possible combinations of n (among N) elements, with allowed repetition. Then look at the frequency of occurence of each of these patterns in the data to try to identify patterns.
Since I also want to look at the possibility to vary the space ordering of the elements inside of a pattern, it does indeed generate a huge amount of combinations.

If it’s based on data then you know that all K^N combinations cannot possibly happen for large values of K and N. So instead of scanning for all the possibilities—which would be O(D*K^N) where D is the size of the data—just scan the data once and record all the combinations that actually occur which will only be O(D). It’s a linear scan of your data; hard to do better than that.

3 Likes

for my problem this helped me
collect(Iterators.product(-1:1, -1:1))[:]

1 Like