# Function like expand.grid in R

I need a function similar to `expand.grid` in R, which takes a set of vectors and returns their Cartesian product.

Is there existing art? My first cut is

``````function expandgrid(vecs)
expd = []   # vector of expanded vectors
inner, outer = 1, prod(length, vecs)
for v in vecs
lv = length(v)
outer Γ·= lv
push!(expd, repeat(v, inner=inner, outer=outer))
inner *= lv
end
expd
end
``````

although Iβm not sure that `expandgrid` is the best name in Julia.

Can someone suggest a better name and/or function implementation?

Does `product()` from Iterators.jl do what you need?

``````julia> expandgrid([[1,2,3], [4,5,6]])
2-element Array{Any,1}:
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[4, 4, 4, 5, 5, 5, 6, 6, 6]

julia> using Iterators

julia> collect(product([1,2,3], [4,5,6]))
9-element Array{Tuple{Int64,Int64},1}:
(1, 4)
(2, 4)
(3, 4)
(1, 5)
(2, 5)
(3, 5)
(1, 6)
(2, 6)
(3, 6)
``````
3 Likes

Yes, `Iterators.product` is what I need, thanks.

Since 0.5 you can also just use `Base.product`:

``````julia> collect(Base.product([1,2,3],[4,5,6]))
3Γ3 Array{Tuple{Int64,Int64},2}:
(1, 4)  (1, 5)  (1, 6)
(2, 4)  (2, 5)  (2, 6)
(3, 4)  (3, 5)  (3, 6)
``````

You can just `vec` it (or use linear indexing) if youβd rather not have the shape.

1 Like

Sorry to bring this up, but how can I further transform the vector of tuples created with `vec` into a table / dataframe (with each element of the tuple as observation of a column)? Thanks

``````julia> vec(collect(Base.product([1,2],["A", "B"])))

Tuple{Int64,String}[(1, "A"), (2, "A"), (1, "B"), (2, "B")]
``````

Desired output:

``````4Γ2 DataFrame
β Row β X β Y β
βββββββΌββββΌββββ€
β 1   β 1 β A β
β 2   β 2 β A β
β 3   β 1 β B β
β 4   β 2 β B β
``````

Or just a simple matrix / table is fine!

I think this would work for what you are trying to do.

``````vector = vec(collect(Base.product([1,2],["A", "B"])))
df = DataFrame(map(x -> getindex.(vector, x), eachindex(first(vector))))
``````
1 Like

Seems you want an β`unzip`β iterator, ref https://github.com/JuliaLang/julia/issues/13942.

This is an ugly way heh

``````julia> v = vec(collect(Base.product([1,2],["A", "B"])))
4-element Array{Tuple{Int64,String},1}:
(1, "A")
(2, "A")
(1, "B")
(2, "B")

julia> DataFrame(collect.(collect(zip(v...))))
4Γ2 DataFrame
β Row β x1 β x2 β
βββββββΌβββββΌβββββ€
β 1   β 1  β A  β
β 2   β 2  β A  β
β 3   β 1  β B  β
β 4   β 2  β B  β
``````
2 Likes

`unzip` would be so sweet to have tho

Thanks, I would have never found the solution myself

A trivial package (with unit tests etc) could solve this very quickly.

Well, it seems with my use case there is a `collect` or so missing somwhere:

I am trying to apply a function (`datagrid`) to each column of a dataframe, then create a new dataframe which would be the βproductβ (all combinations) of the previous values.

``````using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
X = collect(range(minimum(X), stop=maximum(X), length=n))
end

# Data
df = DataFrame(X=[0,2], Y=[10, 20])

X = colwise(x -> datagrid(x, n=3), df)
``````

Unfortunately, if I apply the previous method, it seems to be missing one unzipping / collecting step which I am not sure on how to insert:

``````X = vec(collect(Base.product(X)))
X = DataFrame(collect.(collect(zip(X...))))

2Γ1 DataFrame
β Row β x1                 β
βββββββΌβββββββββββββββββββββ€
β 1   β [0.0, 1.0, 2.0]    β
β 2   β [10.0, 15.0, 20.0] β
``````

Thanks you

For me, it is not clear, what the desired result of

``````collect(Base.product(X))
``````

should be.

``````julia> collect(Base.product(X))
2-element Array{Tuple{Array{Float64,1}},1}:
([0.0, 1.0, 2.0],)
([10.0, 15.0, 20.0],)
``````

But maybe it should be

``````julia> X=collect(Base.product(X...))
3Γ3 Array{Tuple{Float64,Float64},2}:
(0.0, 10.0)  (0.0, 15.0)  (0.0, 20.0)
(1.0, 10.0)  (1.0, 15.0)  (1.0, 20.0)
(2.0, 10.0)  (2.0, 15.0)  (2.0, 20.0)
``````

and then yields:

``````julia> X = DataFrame(collect.(collect(zip(X...))))
9Γ2 DataFrame
β Row β x1  β x2   β
βββββββΌββββββΌβββββββ€
β 1   β 0.0 β 10.0 β
β 2   β 1.0 β 10.0 β
β 3   β 2.0 β 10.0 β
β 4   β 0.0 β 15.0 β
β 5   β 1.0 β 15.0 β
β 6   β 2.0 β 15.0 β
β 7   β 0.0 β 20.0 β
β 8   β 1.0 β 20.0 β
β 9   β 2.0 β 20.0 β
``````

I am just guessing here.

perfect guess, that was what I wanted, thanks a lot

I forgot the `vec` in

``````julia> X=vec(collect(Base.product(X...)))
9-element Array{Tuple{Float64,Float64},1}:
(0.0, 10.0)
(1.0, 10.0)
(2.0, 10.0)
(0.0, 15.0)
(1.0, 15.0)
(2.0, 15.0)
(0.0, 20.0)
(1.0, 20.0)
(2.0, 20.0)
``````

but end result is the same:

``````julia> X=DataFrame(collect.(collect(zip(X...))))
9Γ2 DataFrame
β Row β x1  β x2   β
βββββββΌββββββΌβββββββ€
β 1   β 0.0 β 10.0 β
β 2   β 1.0 β 10.0 β
β 3   β 2.0 β 10.0 β
β 4   β 0.0 β 15.0 β
β 5   β 1.0 β 15.0 β
β 6   β 2.0 β 15.0 β
β 7   β 0.0 β 20.0 β
β 8   β 1.0 β 20.0 β
β 9   β 2.0 β 20.0 β
``````

It seems that the last step doesnβt work when they are three variables (it takes forever to compute):

These first steps work as expected:

``````df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))
``````
``````27-element Array{Tuple{Float64,Float64,Float64},1}:
(1.0, 5.0, 8.0)
(2.0, 5.0, 8.0)
...
(2.0, 6.0, 9.0)
(3.0, 6.0, 9.0)
``````

However, the last line:

``````grid = DataFrame(collect.(collect(zip(grid...))))
``````

gets stuck, and the problem is apparently related to `zip(grid...)` that never ends computingβ¦

PS: the desired output would be a `27Γ3 DataFrame` with each element of the tuple as a different column (the same as above but with 3 columns). Hope I am clear enoughβ¦

This seems to be a bug in julia 1.0.0

julia 0.6.4:

``````julia> a=1:30
1:30

julia> collect(zip(a...))
0-dimensional Array{NTuple{30,Int64},0}:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
``````

julia 1.0.0:

``````julia> a=1:30
1:30

julia> zip(a...)
``````

takes forever (at least several minutes, I didnβt wait until it comes back to the REPL).

It seems to be a known issue:

I have commented in:

This is how it works nicely in julia 0.6.4:

``````using DataFrames

function datagrid(X::Array{<:Number}; n::Int=10, kwargs...)
collect(range(minimum(X), (maximum(X)-minimum(X))/(n-1), n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = getindex.(colwise(x -> datagrid(Array(x), n=3), df),1)
grid = vec(collect(Base.product(grid...)))
grid = DataFrame(collect.(collect(zip(grid...))))

27Γ3 DataFrames.DataFrame
β Row β x1  β x2  β x3  β
βββββββΌββββββΌββββββΌββββββ€
β 1   β 1.0 β 5.0 β 8.0 β
β 2   β 2.0 β 5.0 β 8.0 β
β 3   β 3.0 β 5.0 β 8.0 β
β 4   β 1.0 β 5.5 β 8.0 β
β 5   β 2.0 β 5.5 β 8.0 β
β 6   β 3.0 β 5.5 β 8.0 β
β 7   β 1.0 β 6.0 β 8.0 β
β 8   β 2.0 β 6.0 β 8.0 β
β 9   β 3.0 β 6.0 β 8.0 β
β 10  β 1.0 β 5.0 β 8.5 β
β 11  β 2.0 β 5.0 β 8.5 β
β 12  β 3.0 β 5.0 β 8.5 β
β 13  β 1.0 β 5.5 β 8.5 β
β 14  β 2.0 β 5.5 β 8.5 β
β 15  β 3.0 β 5.5 β 8.5 β
β 16  β 1.0 β 6.0 β 8.5 β
β 17  β 2.0 β 6.0 β 8.5 β
β 18  β 3.0 β 6.0 β 8.5 β
β 19  β 1.0 β 5.0 β 9.0 β
β 20  β 2.0 β 5.0 β 9.0 β
β 21  β 3.0 β 5.0 β 9.0 β
β 22  β 1.0 β 5.5 β 9.0 β
β 23  β 2.0 β 5.5 β 9.0 β
β 24  β 3.0 β 5.5 β 9.0 β
β 25  β 1.0 β 6.0 β 9.0 β
β 26  β 2.0 β 6.0 β 9.0 β
β 27  β 3.0 β 6.0 β 9.0 β
``````
1 Like

This is a julia 1.0.0 suggestion avoiding `zip`, only the last line changed:

``````using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
X = collect(range(minimum(X), stop=maximum(X), length=n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))

grid = DataFrame(collect.([ getindex.(grid,t) for t in 1:length(grid[1]) ]))

27Γ3 DataFrame
β Row β x1  β x2  β x3  β
βββββββΌββββββΌββββββΌββββββ€
β 1   β 1.0 β 5.0 β 8.0 β
β 2   β 2.0 β 5.0 β 8.0 β
β 3   β 3.0 β 5.0 β 8.0 β
β 4   β 1.0 β 5.5 β 8.0 β
β 5   β 2.0 β 5.5 β 8.0 β
β 6   β 3.0 β 5.5 β 8.0 β
β 7   β 1.0 β 6.0 β 8.0 β
β 8   β 2.0 β 6.0 β 8.0 β
β 9   β 3.0 β 6.0 β 8.0 β
β 10  β 1.0 β 5.0 β 8.5 β
β 11  β 2.0 β 5.0 β 8.5 β
β 12  β 3.0 β 5.0 β 8.5 β
β 13  β 1.0 β 5.5 β 8.5 β
β 14  β 2.0 β 5.5 β 8.5 β
β 15  β 3.0 β 5.5 β 8.5 β
β 16  β 1.0 β 6.0 β 8.5 β
β 17  β 2.0 β 6.0 β 8.5 β
β 18  β 3.0 β 6.0 β 8.5 β
β 19  β 1.0 β 5.0 β 9.0 β
β 20  β 2.0 β 5.0 β 9.0 β
β 21  β 3.0 β 5.0 β 9.0 β
β 22  β 1.0 β 5.5 β 9.0 β
β 23  β 2.0 β 5.5 β 9.0 β
β 24  β 3.0 β 5.5 β 9.0 β
β 25  β 1.0 β 6.0 β 9.0 β
β 26  β 2.0 β 6.0 β 9.0 β
β 27  β 3.0 β 6.0 β 9.0 β
``````
1 Like

This is implemented in QuantEcon

``````using QuantEcon, DataFrames
julia> DataFrame(gridmake([1,2,3], [4,5,6]))
9Γ2 DataFrame
β Row β x1 β x2 β
βββββββΌβββββΌβββββ€
β 1   β 1  β 4  β
β 2   β 2  β 4  β
β 3   β 3  β 4  β
β 4   β 1  β 5  β
β 5   β 2  β 5  β
β 6   β 3  β 5  β
β 7   β 1  β 6  β
β 8   β 2  β 6  β
β 9   β 3  β 6  β

``````

QuantEcon is available for β₯ 0.6 (includes 1.0). If you donβt want to use the package, my suggested implementation is

``````function gridmake(arrays::AbstractVecOrMat...)
l = size.(arrays, 1)
nrows = prod(l)
output = mapreduce(a_o -> repeat(a_o[1],
inner = (a_o[2], 1),
outer = (div(nrows, size(a_o[1], 1) * a_o[2]), 1)),
hcat,
zip(arrays, cumprod(prepend!(collect(l[1:end - 1]), 1))))
return output
end
``````

Note: Many of the implementations suggested above only work for `AbstractVector` and give the wrong result for `AbstractMatrix`. As a test case you can use,

``````using QuantEcon: gridmake
using BenchmarkTools: @btime

const x, y, z = 1:3, [10 20; 30 40], [100, 200];

arrays = [x, y, z];

# Here goes the implementation call it magic
magic(arrays...) == gridmake(arrays...) # They match
@btime magic(\$arrays...) # Check efficiency
``````
3 Likes

I know this is an old question but in case someone is still looking for a solution that works like the R expand.grid function (i.e. passing a list of named variables of any type and returning a data frame with the variable names as column names, each column of the type of the original variable, and all possible combinations of the different variables), this is my Julia-newbie attempt at it:

``````using DataFrames

function expand_grid(; iters...)
var_names = collect(keys(iters))
var_itr = [1:length(x) for x in iters.data]
var_ix = vcat([collect(x)' for x in Iterators.product(var_itr...)]...)
out = DataFrame()
for i = 1:length(var_names)
out[:,var_names[i]] = collect(iters[i])[var_ix[:,i]]
end
return out
end

expand_grid(a=1:2, b=1.0:5.0, c=["one", "two", "three", "four"])
``````

There is most likely a more efficient or cleaner way to do this but this is the best I could come up with that would give me what I expect from the R function.