# Function like expand.grid in R

I need a function similar to `expand.grid` in R, which takes a set of vectors and returns their Cartesian product.

Is there existing art? My first cut is

``````function expandgrid(vecs)
expd = []   # vector of expanded vectors
inner, outer = 1, prod(length, vecs)
for v in vecs
lv = length(v)
outer ÷= lv
push!(expd, repeat(v, inner=inner, outer=outer))
inner *= lv
end
expd
end
``````

although I’m not sure that `expandgrid` is the best name in Julia.

Can someone suggest a better name and/or function implementation?

Does `product()` from Iterators.jl do what you need?

``````julia> expandgrid([[1,2,3], [4,5,6]])
2-element Array{Any,1}:
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[4, 4, 4, 5, 5, 5, 6, 6, 6]

julia> using Iterators

julia> collect(product([1,2,3], [4,5,6]))
9-element Array{Tuple{Int64,Int64},1}:
(1, 4)
(2, 4)
(3, 4)
(1, 5)
(2, 5)
(3, 5)
(1, 6)
(2, 6)
(3, 6)
``````
3 Likes

Yes, `Iterators.product` is what I need, thanks.

Since 0.5 you can also just use `Base.product`:

``````julia> collect(Base.product([1,2,3],[4,5,6]))
3×3 Array{Tuple{Int64,Int64},2}:
(1, 4)  (1, 5)  (1, 6)
(2, 4)  (2, 5)  (2, 6)
(3, 4)  (3, 5)  (3, 6)
``````

You can just `vec` it (or use linear indexing) if you’d rather not have the shape.

1 Like

Sorry to bring this up, but how can I further transform the vector of tuples created with `vec` into a table / dataframe (with each element of the tuple as observation of a column)? Thanks

``````julia> vec(collect(Base.product([1,2],["A", "B"])))

Tuple{Int64,String}[(1, "A"), (2, "A"), (1, "B"), (2, "B")]
``````

Desired output:

``````4×2 DataFrame
│ Row │ X │ Y │
├─────┼───┼───┤
│ 1   │ 1 │ A │
│ 2   │ 2 │ A │
│ 3   │ 1 │ B │
│ 4   │ 2 │ B │
``````

Or just a simple matrix / table is fine!

I think this would work for what you are trying to do.

``````vector = vec(collect(Base.product([1,2],["A", "B"])))
df = DataFrame(map(x -> getindex.(vector, x), eachindex(first(vector))))
``````
1 Like

Seems you want an “`unzip`” iterator, ref https://github.com/JuliaLang/julia/issues/13942.

This is an ugly way heh

``````julia> v = vec(collect(Base.product([1,2],["A", "B"])))
4-element Array{Tuple{Int64,String},1}:
(1, "A")
(2, "A")
(1, "B")
(2, "B")

julia> DataFrame(collect.(collect(zip(v...))))
4×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ A  │
│ 2   │ 2  │ A  │
│ 3   │ 1  │ B  │
│ 4   │ 2  │ B  │
``````
2 Likes

`unzip` would be so sweet to have tho

Thanks, I would have never found the solution myself A trivial package (with unit tests etc) could solve this very quickly.

Well, it seems with my use case there is a `collect` or so missing somwhere:

I am trying to apply a function (`datagrid`) to each column of a dataframe, then create a new dataframe which would be the “product” (all combinations) of the previous values.

``````using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
X = collect(range(minimum(X), stop=maximum(X), length=n))
end

# Data
df = DataFrame(X=[0,2], Y=[10, 20])

X = colwise(x -> datagrid(x, n=3), df)
``````

Unfortunately, if I apply the previous method, it seems to be missing one unzipping / collecting step which I am not sure on how to insert:

``````X = vec(collect(Base.product(X)))
X = DataFrame(collect.(collect(zip(X...))))

2×1 DataFrame
│ Row │ x1                 │
├─────┼────────────────────┤
│ 1   │ [0.0, 1.0, 2.0]    │
│ 2   │ [10.0, 15.0, 20.0] │
``````

Thanks you

For me, it is not clear, what the desired result of

``````collect(Base.product(X))
``````

should be.

``````julia> collect(Base.product(X))
2-element Array{Tuple{Array{Float64,1}},1}:
([0.0, 1.0, 2.0],)
([10.0, 15.0, 20.0],)
``````

But maybe it should be

``````julia> X=collect(Base.product(X...))
3×3 Array{Tuple{Float64,Float64},2}:
(0.0, 10.0)  (0.0, 15.0)  (0.0, 20.0)
(1.0, 10.0)  (1.0, 15.0)  (1.0, 20.0)
(2.0, 10.0)  (2.0, 15.0)  (2.0, 20.0)
``````

and then yields:

``````julia> X = DataFrame(collect.(collect(zip(X...))))
9×2 DataFrame
│ Row │ x1  │ x2   │
├─────┼─────┼──────┤
│ 1   │ 0.0 │ 10.0 │
│ 2   │ 1.0 │ 10.0 │
│ 3   │ 2.0 │ 10.0 │
│ 4   │ 0.0 │ 15.0 │
│ 5   │ 1.0 │ 15.0 │
│ 6   │ 2.0 │ 15.0 │
│ 7   │ 0.0 │ 20.0 │
│ 8   │ 1.0 │ 20.0 │
│ 9   │ 2.0 │ 20.0 │
``````

I am just guessing here.

perfect guess, that was what I wanted, thanks a lot I forgot the `vec` in

``````julia> X=vec(collect(Base.product(X...)))
9-element Array{Tuple{Float64,Float64},1}:
(0.0, 10.0)
(1.0, 10.0)
(2.0, 10.0)
(0.0, 15.0)
(1.0, 15.0)
(2.0, 15.0)
(0.0, 20.0)
(1.0, 20.0)
(2.0, 20.0)
``````

but end result is the same:

``````julia> X=DataFrame(collect.(collect(zip(X...))))
9×2 DataFrame
│ Row │ x1  │ x2   │
├─────┼─────┼──────┤
│ 1   │ 0.0 │ 10.0 │
│ 2   │ 1.0 │ 10.0 │
│ 3   │ 2.0 │ 10.0 │
│ 4   │ 0.0 │ 15.0 │
│ 5   │ 1.0 │ 15.0 │
│ 6   │ 2.0 │ 15.0 │
│ 7   │ 0.0 │ 20.0 │
│ 8   │ 1.0 │ 20.0 │
│ 9   │ 2.0 │ 20.0 │
`````` It seems that the last step doesn’t work when they are three variables (it takes forever to compute):

These first steps work as expected:

``````df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))
``````
``````27-element Array{Tuple{Float64,Float64,Float64},1}:
(1.0, 5.0, 8.0)
(2.0, 5.0, 8.0)
...
(2.0, 6.0, 9.0)
(3.0, 6.0, 9.0)
``````

However, the last line:

``````grid = DataFrame(collect.(collect(zip(grid...))))
``````

gets stuck, and the problem is apparently related to `zip(grid...)` that never ends computing…

PS: the desired output would be a `27×3 DataFrame` with each element of the tuple as a different column (the same as above but with 3 columns). Hope I am clear enough…

This seems to be a bug in julia 1.0.0

julia 0.6.4:

``````julia> a=1:30
1:30

julia> collect(zip(a...))
0-dimensional Array{NTuple{30,Int64},0}:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
``````

julia 1.0.0:

``````julia> a=1:30
1:30

julia> zip(a...)
``````

takes forever (at least several minutes, I didn’t wait until it comes back to the REPL).

It seems to be a known issue:

I have commented in:

This is how it works nicely in julia 0.6.4:

``````using DataFrames

function datagrid(X::Array{<:Number}; n::Int=10, kwargs...)
collect(range(minimum(X), (maximum(X)-minimum(X))/(n-1), n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = getindex.(colwise(x -> datagrid(Array(x), n=3), df),1)
grid = vec(collect(Base.product(grid...)))
grid = DataFrame(collect.(collect(zip(grid...))))

27×3 DataFrames.DataFrame
│ Row │ x1  │ x2  │ x3  │
├─────┼─────┼─────┼─────┤
│ 1   │ 1.0 │ 5.0 │ 8.0 │
│ 2   │ 2.0 │ 5.0 │ 8.0 │
│ 3   │ 3.0 │ 5.0 │ 8.0 │
│ 4   │ 1.0 │ 5.5 │ 8.0 │
│ 5   │ 2.0 │ 5.5 │ 8.0 │
│ 6   │ 3.0 │ 5.5 │ 8.0 │
│ 7   │ 1.0 │ 6.0 │ 8.0 │
│ 8   │ 2.0 │ 6.0 │ 8.0 │
│ 9   │ 3.0 │ 6.0 │ 8.0 │
│ 10  │ 1.0 │ 5.0 │ 8.5 │
│ 11  │ 2.0 │ 5.0 │ 8.5 │
│ 12  │ 3.0 │ 5.0 │ 8.5 │
│ 13  │ 1.0 │ 5.5 │ 8.5 │
│ 14  │ 2.0 │ 5.5 │ 8.5 │
│ 15  │ 3.0 │ 5.5 │ 8.5 │
│ 16  │ 1.0 │ 6.0 │ 8.5 │
│ 17  │ 2.0 │ 6.0 │ 8.5 │
│ 18  │ 3.0 │ 6.0 │ 8.5 │
│ 19  │ 1.0 │ 5.0 │ 9.0 │
│ 20  │ 2.0 │ 5.0 │ 9.0 │
│ 21  │ 3.0 │ 5.0 │ 9.0 │
│ 22  │ 1.0 │ 5.5 │ 9.0 │
│ 23  │ 2.0 │ 5.5 │ 9.0 │
│ 24  │ 3.0 │ 5.5 │ 9.0 │
│ 25  │ 1.0 │ 6.0 │ 9.0 │
│ 26  │ 2.0 │ 6.0 │ 9.0 │
│ 27  │ 3.0 │ 6.0 │ 9.0 │
``````
1 Like

This is a julia 1.0.0 suggestion avoiding `zip`, only the last line changed:

``````using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
X = collect(range(minimum(X), stop=maximum(X), length=n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))

grid = DataFrame(collect.([ getindex.(grid,t) for t in 1:length(grid) ]))

27×3 DataFrame
│ Row │ x1  │ x2  │ x3  │
├─────┼─────┼─────┼─────┤
│ 1   │ 1.0 │ 5.0 │ 8.0 │
│ 2   │ 2.0 │ 5.0 │ 8.0 │
│ 3   │ 3.0 │ 5.0 │ 8.0 │
│ 4   │ 1.0 │ 5.5 │ 8.0 │
│ 5   │ 2.0 │ 5.5 │ 8.0 │
│ 6   │ 3.0 │ 5.5 │ 8.0 │
│ 7   │ 1.0 │ 6.0 │ 8.0 │
│ 8   │ 2.0 │ 6.0 │ 8.0 │
│ 9   │ 3.0 │ 6.0 │ 8.0 │
│ 10  │ 1.0 │ 5.0 │ 8.5 │
│ 11  │ 2.0 │ 5.0 │ 8.5 │
│ 12  │ 3.0 │ 5.0 │ 8.5 │
│ 13  │ 1.0 │ 5.5 │ 8.5 │
│ 14  │ 2.0 │ 5.5 │ 8.5 │
│ 15  │ 3.0 │ 5.5 │ 8.5 │
│ 16  │ 1.0 │ 6.0 │ 8.5 │
│ 17  │ 2.0 │ 6.0 │ 8.5 │
│ 18  │ 3.0 │ 6.0 │ 8.5 │
│ 19  │ 1.0 │ 5.0 │ 9.0 │
│ 20  │ 2.0 │ 5.0 │ 9.0 │
│ 21  │ 3.0 │ 5.0 │ 9.0 │
│ 22  │ 1.0 │ 5.5 │ 9.0 │
│ 23  │ 2.0 │ 5.5 │ 9.0 │
│ 24  │ 3.0 │ 5.5 │ 9.0 │
│ 25  │ 1.0 │ 6.0 │ 9.0 │
│ 26  │ 2.0 │ 6.0 │ 9.0 │
│ 27  │ 3.0 │ 6.0 │ 9.0 │
``````
1 Like

This is implemented in QuantEcon

``````using QuantEcon, DataFrames
julia> DataFrame(gridmake([1,2,3], [4,5,6]))
9×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ 4  │
│ 2   │ 2  │ 4  │
│ 3   │ 3  │ 4  │
│ 4   │ 1  │ 5  │
│ 5   │ 2  │ 5  │
│ 6   │ 3  │ 5  │
│ 7   │ 1  │ 6  │
│ 8   │ 2  │ 6  │
│ 9   │ 3  │ 6  │

``````

QuantEcon is available for ≥ 0.6 (includes 1.0). If you don’t want to use the package, my suggested implementation is

``````function gridmake(arrays::AbstractVecOrMat...)
l = size.(arrays, 1)
nrows = prod(l)
output = mapreduce(a_o -> repeat(a_o,
inner = (a_o, 1),
outer = (div(nrows, size(a_o, 1) * a_o), 1)),
hcat,
zip(arrays, cumprod(prepend!(collect(l[1:end - 1]), 1))))
return output
end
``````

Note: Many of the implementations suggested above only work for `AbstractVector` and give the wrong result for `AbstractMatrix`. As a test case you can use,

``````using QuantEcon: gridmake
using BenchmarkTools: @btime

const x, y, z = 1:3, [10 20; 30 40], [100, 200];

arrays = [x, y, z];

# Here goes the implementation call it magic
magic(arrays...) == gridmake(arrays...) # They match
@btime magic(\$arrays...) # Check efficiency
``````
3 Likes

I know this is an old question but in case someone is still looking for a solution that works like the R expand.grid function (i.e. passing a list of named variables of any type and returning a data frame with the variable names as column names, each column of the type of the original variable, and all possible combinations of the different variables), this is my Julia-newbie attempt at it:

``````using DataFrames

function expand_grid(; iters...)
var_names = collect(keys(iters))
var_itr = [1:length(x) for x in iters.data]
var_ix = vcat([collect(x)' for x in Iterators.product(var_itr...)]...)
out = DataFrame()
for i = 1:length(var_names)
out[:,var_names[i]] = collect(iters[i])[var_ix[:,i]]
end
return out
end

expand_grid(a=1:2, b=1.0:5.0, c=["one", "two", "three", "four"])
``````

There is most likely a more efficient or cleaner way to do this but this is the best I could come up with that would give me what I expect from the R function.