Function like expand.grid in R

question

#1

I need a function similar to expand.grid in R, which takes a set of vectors and returns their Cartesian product.

Is there existing art? My first cut is

function expandgrid(vecs)
    expd = []   # vector of expanded vectors
    inner, outer = 1, prod(length, vecs)
    for v in vecs
        lv = length(v)
        outer Γ·= lv
        push!(expd, repeat(v, inner=inner, outer=outer))
        inner *= lv
    end
    expd
end

although I’m not sure that expandgrid is the best name in Julia.

Can someone suggest a better name and/or function implementation?


#2

Does product() from Iterators.jl do what you need?

julia> expandgrid([[1,2,3], [4,5,6]])
2-element Array{Any,1}:
 [1, 2, 3, 1, 2, 3, 1, 2, 3]
 [4, 4, 4, 5, 5, 5, 6, 6, 6]

julia> using Iterators

julia> collect(product([1,2,3], [4,5,6]))
9-element Array{Tuple{Int64,Int64},1}:
 (1, 4)
 (2, 4)
 (3, 4)
 (1, 5)
 (2, 5)
 (3, 5)
 (1, 6)
 (2, 6)
 (3, 6)

#3

Yes, Iterators.product is what I need, thanks.


#4

Since 0.5 you can also just use Base.product:

julia> collect(Base.product([1,2,3],[4,5,6]))
3Γ—3 Array{Tuple{Int64,Int64},2}:
 (1, 4)  (1, 5)  (1, 6)
 (2, 4)  (2, 5)  (2, 6)
 (3, 4)  (3, 5)  (3, 6)

You can just vec it (or use linear indexing) if you’d rather not have the shape.


#5

Sorry to bring this up, but how can I further transform the vector of tuples created with vec into a table / dataframe (with each element of the tuple as observation of a column)? Thanks

julia> vec(collect(Base.product([1,2],["A", "B"])))

Tuple{Int64,String}[(1, "A"), (2, "A"), (1, "B"), (2, "B")]

Desired output:

4Γ—2 DataFrame
β”‚ Row β”‚ X β”‚ Y β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚ 1   β”‚ 1 β”‚ A β”‚
β”‚ 2   β”‚ 2 β”‚ A β”‚
β”‚ 3   β”‚ 1 β”‚ B β”‚
β”‚ 4   β”‚ 2 β”‚ B β”‚

Or just a simple matrix / table is fine!


#6

I think this would work for what you are trying to do.

vector = vec(collect(Base.product([1,2],["A", "B"])))
df = DataFrame(map(x -> getindex.(vector, x), eachindex(first(vector))))

#7

Seems you want an β€œunzip” iterator, ref https://github.com/JuliaLang/julia/issues/13942.

This is an ugly way heh

julia> v = vec(collect(Base.product([1,2],["A", "B"])))
4-element Array{Tuple{Int64,String},1}:
 (1, "A")
 (2, "A")
 (1, "B")
 (2, "B")

julia> DataFrame(collect.(collect(zip(v...))))
4Γ—2 DataFrame
β”‚ Row β”‚ x1 β”‚ x2 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1  β”‚ A  β”‚
β”‚ 2   β”‚ 2  β”‚ A  β”‚
β”‚ 3   β”‚ 1  β”‚ B  β”‚
β”‚ 4   β”‚ 2  β”‚ B  β”‚

#8

unzip would be so sweet to have tho


#9

Thanks, I would have never found the solution myself :sweat_smile:


#10

A trivial package (with unit tests etc) could solve this very quickly.


#11

Well, it seems with my use case there is a collect or so missing somwhere:

I am trying to apply a function (datagrid) to each column of a dataframe, then create a new dataframe which would be the β€œproduct” (all combinations) of the previous values.

using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
    X = collect(range(minimum(X), stop=maximum(X), length=n))
end

# Data
df = DataFrame(X=[0,2], Y=[10, 20])

X = colwise(x -> datagrid(x, n=3), df)

Unfortunately, if I apply the previous method, it seems to be missing one unzipping / collecting step which I am not sure on how to insert:

X = vec(collect(Base.product(X)))
X = DataFrame(collect.(collect(zip(X...))))

2Γ—1 DataFrame
β”‚ Row β”‚ x1                 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ [0.0, 1.0, 2.0]    β”‚
β”‚ 2   β”‚ [10.0, 15.0, 20.0] β”‚

Thanks you


#12

For me, it is not clear, what the desired result of

collect(Base.product(X))

should be.
With your code it is

julia> collect(Base.product(X))
2-element Array{Tuple{Array{Float64,1}},1}:
 ([0.0, 1.0, 2.0],)
 ([10.0, 15.0, 20.0],)

But maybe it should be

julia> X=collect(Base.product(X...))
3Γ—3 Array{Tuple{Float64,Float64},2}:
 (0.0, 10.0)  (0.0, 15.0)  (0.0, 20.0)
 (1.0, 10.0)  (1.0, 15.0)  (1.0, 20.0)
 (2.0, 10.0)  (2.0, 15.0)  (2.0, 20.0)

and then yields:

julia> X = DataFrame(collect.(collect(zip(X...))))
9Γ—2 DataFrame
β”‚ Row β”‚ x1  β”‚ x2   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.0 β”‚ 10.0 β”‚
β”‚ 2   β”‚ 1.0 β”‚ 10.0 β”‚
β”‚ 3   β”‚ 2.0 β”‚ 10.0 β”‚
β”‚ 4   β”‚ 0.0 β”‚ 15.0 β”‚
β”‚ 5   β”‚ 1.0 β”‚ 15.0 β”‚
β”‚ 6   β”‚ 2.0 β”‚ 15.0 β”‚
β”‚ 7   β”‚ 0.0 β”‚ 20.0 β”‚
β”‚ 8   β”‚ 1.0 β”‚ 20.0 β”‚
β”‚ 9   β”‚ 2.0 β”‚ 20.0 β”‚

I am just guessing here.


#13

perfect guess, that was what I wanted, thanks a lot :slight_smile:


#14

I forgot the vec in

julia> X=vec(collect(Base.product(X...)))
9-element Array{Tuple{Float64,Float64},1}:
 (0.0, 10.0)
 (1.0, 10.0)
 (2.0, 10.0)
 (0.0, 15.0)
 (1.0, 15.0)
 (2.0, 15.0)
 (0.0, 20.0)
 (1.0, 20.0)
 (2.0, 20.0)

but end result is the same:

julia> X=DataFrame(collect.(collect(zip(X...))))
9Γ—2 DataFrame
β”‚ Row β”‚ x1  β”‚ x2   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.0 β”‚ 10.0 β”‚
β”‚ 2   β”‚ 1.0 β”‚ 10.0 β”‚
β”‚ 3   β”‚ 2.0 β”‚ 10.0 β”‚
β”‚ 4   β”‚ 0.0 β”‚ 15.0 β”‚
β”‚ 5   β”‚ 1.0 β”‚ 15.0 β”‚
β”‚ 6   β”‚ 2.0 β”‚ 15.0 β”‚
β”‚ 7   β”‚ 0.0 β”‚ 20.0 β”‚
β”‚ 8   β”‚ 1.0 β”‚ 20.0 β”‚
β”‚ 9   β”‚ 2.0 β”‚ 20.0 β”‚

:grin:


#15

It seems that the last step doesn’t work when they are three variables (it takes forever to compute):

These first steps work as expected:

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))
27-element Array{Tuple{Float64,Float64,Float64},1}:
 (1.0, 5.0, 8.0)
 (2.0, 5.0, 8.0)
 ...
 (2.0, 6.0, 9.0)
 (3.0, 6.0, 9.0)

However, the last line:

grid = DataFrame(collect.(collect(zip(grid...))))

gets stuck, and the problem is apparently related to zip(grid...) that never ends computing…

PS: the desired output would be a 27Γ—3 DataFrame with each element of the tuple as a different column (the same as above but with 3 columns). Hope I am clear enough…


#16

This seems to be a bug in julia 1.0.0

julia 0.6.4:

julia> a=1:30
1:30

julia> collect(zip(a...))
0-dimensional Array{NTuple{30,Int64},0}:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)

julia 1.0.0:

julia> a=1:30
1:30

julia> zip(a...)

takes forever (at least several minutes, I didn’t wait until it comes back to the REPL).

It seems to be a known issue:

I have commented in:


#17

This is how it works nicely in julia 0.6.4:

using DataFrames

function datagrid(X::Array{<:Number}; n::Int=10, kwargs...)
    collect(range(minimum(X), (maximum(X)-minimum(X))/(n-1), n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = getindex.(colwise(x -> datagrid(Array(x), n=3), df),1)
grid = vec(collect(Base.product(grid...)))
grid = DataFrame(collect.(collect(zip(grid...))))

27Γ—3 DataFrames.DataFrame
β”‚ Row β”‚ x1  β”‚ x2  β”‚ x3  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 2   β”‚ 2.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 3   β”‚ 3.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 4   β”‚ 1.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 5   β”‚ 2.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 6   β”‚ 3.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 7   β”‚ 1.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 8   β”‚ 2.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 9   β”‚ 3.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 10  β”‚ 1.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 11  β”‚ 2.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 12  β”‚ 3.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 13  β”‚ 1.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 14  β”‚ 2.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 15  β”‚ 3.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 16  β”‚ 1.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 17  β”‚ 2.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 18  β”‚ 3.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 19  β”‚ 1.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 20  β”‚ 2.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 21  β”‚ 3.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 22  β”‚ 1.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 23  β”‚ 2.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 24  β”‚ 3.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 25  β”‚ 1.0 β”‚ 6.0 β”‚ 9.0 β”‚
β”‚ 26  β”‚ 2.0 β”‚ 6.0 β”‚ 9.0 β”‚
β”‚ 27  β”‚ 3.0 β”‚ 6.0 β”‚ 9.0 β”‚

#18

This is a julia 1.0.0 suggestion avoiding zip, only the last line changed:

using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
    X = collect(range(minimum(X), stop=maximum(X), length=n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))

grid = DataFrame(collect.([ getindex.(grid,t) for t in 1:length(grid[1]) ]))

27Γ—3 DataFrame
β”‚ Row β”‚ x1  β”‚ x2  β”‚ x3  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 2   β”‚ 2.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 3   β”‚ 3.0 β”‚ 5.0 β”‚ 8.0 β”‚
β”‚ 4   β”‚ 1.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 5   β”‚ 2.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 6   β”‚ 3.0 β”‚ 5.5 β”‚ 8.0 β”‚
β”‚ 7   β”‚ 1.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 8   β”‚ 2.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 9   β”‚ 3.0 β”‚ 6.0 β”‚ 8.0 β”‚
β”‚ 10  β”‚ 1.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 11  β”‚ 2.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 12  β”‚ 3.0 β”‚ 5.0 β”‚ 8.5 β”‚
β”‚ 13  β”‚ 1.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 14  β”‚ 2.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 15  β”‚ 3.0 β”‚ 5.5 β”‚ 8.5 β”‚
β”‚ 16  β”‚ 1.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 17  β”‚ 2.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 18  β”‚ 3.0 β”‚ 6.0 β”‚ 8.5 β”‚
β”‚ 19  β”‚ 1.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 20  β”‚ 2.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 21  β”‚ 3.0 β”‚ 5.0 β”‚ 9.0 β”‚
β”‚ 22  β”‚ 1.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 23  β”‚ 2.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 24  β”‚ 3.0 β”‚ 5.5 β”‚ 9.0 β”‚
β”‚ 25  β”‚ 1.0 β”‚ 6.0 β”‚ 9.0 β”‚
β”‚ 26  β”‚ 2.0 β”‚ 6.0 β”‚ 9.0 β”‚
β”‚ 27  β”‚ 3.0 β”‚ 6.0 β”‚ 9.0 β”‚

#19

This is implemented in QuantEcon

using QuantEcon, DataFrames
julia> DataFrame(gridmake([1,2,3], [4,5,6]))
9Γ—2 DataFrame
β”‚ Row β”‚ x1 β”‚ x2 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1  β”‚ 4  β”‚
β”‚ 2   β”‚ 2  β”‚ 4  β”‚
β”‚ 3   β”‚ 3  β”‚ 4  β”‚
β”‚ 4   β”‚ 1  β”‚ 5  β”‚
β”‚ 5   β”‚ 2  β”‚ 5  β”‚
β”‚ 6   β”‚ 3  β”‚ 5  β”‚
β”‚ 7   β”‚ 1  β”‚ 6  β”‚
β”‚ 8   β”‚ 2  β”‚ 6  β”‚
β”‚ 9   β”‚ 3  β”‚ 6  β”‚

QuantEcon is available for β‰₯ 0.6 (includes 1.0). If you don’t want to use the package, my suggested implementation is

function gridmake(arrays::AbstractVecOrMat...)
    l = size.(arrays, 1)
    nrows = prod(l)
    output = mapreduce(a_o -> repeat(a_o[1],
                                     inner = (a_o[2], 1),
                                     outer = (div(nrows, size(a_o[1], 1) * a_o[2]), 1)),
                       hcat,
                       zip(arrays, cumprod(prepend!(collect(l[1:end - 1]), 1))))
    return output
end

Note: Many of the implementations suggested above only work for AbstractVector and give the wrong result for AbstractMatrix. As a test case you can use,

using QuantEcon: gridmake
using BenchmarkTools: @btime

const x, y, z = 1:3, [10 20; 30 40], [100, 200];

arrays = [x, y, z];

# Here goes the implementation call it magic
magic(arrays...) == gridmake(arrays...) # They match
@btime magic($arrays...) # Check efficiency