Function like expand.grid in R

dmbates · June 19, 2017, 5:37pm

I need a function similar to expand.grid in R, which takes a set of vectors and returns their Cartesian product.

Is there existing art? My first cut is

function expandgrid(vecs)
    expd = []   # vector of expanded vectors
    inner, outer = 1, prod(length, vecs)
    for v in vecs
        lv = length(v)
        outer ÷= lv
        push!(expd, repeat(v, inner=inner, outer=outer))
        inner *= lv
    end
    expd
end

although I’m not sure that expandgrid is the best name in Julia.

Can someone suggest a better name and/or function implementation?

rdeits · June 19, 2017, 5:51pm

Does product() from Iterators.jl do what you need?

julia> expandgrid([[1,2,3], [4,5,6]])
2-element Array{Any,1}:
 [1, 2, 3, 1, 2, 3, 1, 2, 3]
 [4, 4, 4, 5, 5, 5, 6, 6, 6]

julia> using Iterators

julia> collect(product([1,2,3], [4,5,6]))
9-element Array{Tuple{Int64,Int64},1}:
 (1, 4)
 (2, 4)
 (3, 4)
 (1, 5)
 (2, 5)
 (3, 5)
 (1, 6)
 (2, 6)
 (3, 6)

dmbates · June 19, 2017, 5:54pm

Yes, Iterators.product is what I need, thanks.

jebej · June 19, 2017, 9:39pm

Since 0.5 you can also just use Base.product:

julia> collect(Base.product([1,2,3],[4,5,6]))
3×3 Array{Tuple{Int64,Int64},2}:
 (1, 4)  (1, 5)  (1, 6)
 (2, 4)  (2, 5)  (2, 6)
 (3, 4)  (3, 5)  (3, 6)

You can just vec it (or use linear indexing) if you’d rather not have the shape.

DominiqueMakowski · September 12, 2018, 9:35pm

Sorry to bring this up, but how can I further transform the vector of tuples created with vec into a table / dataframe (with each element of the tuple as observation of a column)? Thanks

julia> vec(collect(Base.product([1,2],["A", "B"])))

Tuple{Int64,String}[(1, "A"), (2, "A"), (1, "B"), (2, "B")]

Desired output:

4×2 DataFrame
│ Row │ X │ Y │
├─────┼───┼───┤
│ 1   │ 1 │ A │
│ 2   │ 2 │ A │
│ 3   │ 1 │ B │
│ 4   │ 2 │ B │

Or just a simple matrix / table is fine!

tnederlof · September 12, 2018, 9:47pm

I think this would work for what you are trying to do.

vector = vec(collect(Base.product([1,2],["A", "B"])))
df = DataFrame(map(x -> getindex.(vector, x), eachindex(first(vector))))

kristoffer.carlsson · September 12, 2018, 9:47pm

Seems you want an “unzip” iterator, ref Base.unzip() · Issue #13942 · JuliaLang/julia · GitHub.

This is an ugly way heh

julia> v = vec(collect(Base.product([1,2],["A", "B"])))
4-element Array{Tuple{Int64,String},1}:
 (1, "A")
 (2, "A")
 (1, "B")
 (2, "B")

julia> DataFrame(collect.(collect(zip(v...))))
4×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ A  │
│ 2   │ 2  │ A  │
│ 3   │ 1  │ B  │
│ 4   │ 2  │ B  │

mkborregaard · September 13, 2018, 6:21am

unzip would be so sweet to have tho

DominiqueMakowski · September 13, 2018, 6:38am

Thanks, I would have never found the solution myself

Tamas_Papp · September 13, 2018, 7:13am

A trivial package (with unit tests etc) could solve this very quickly.

DominiqueMakowski · September 13, 2018, 7:29am

Well, it seems with my use case there is a collect or so missing somwhere:

I am trying to apply a function (datagrid) to each column of a dataframe, then create a new dataframe which would be the “product” (all combinations) of the previous values.

using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
    X = collect(range(minimum(X), stop=maximum(X), length=n))
end

# Data
df = DataFrame(X=[0,2], Y=[10, 20])

X = colwise(x -> datagrid(x, n=3), df)

Unfortunately, if I apply the previous method, it seems to be missing one unzipping / collecting step which I am not sure on how to insert:

X = vec(collect(Base.product(X)))
X = DataFrame(collect.(collect(zip(X...))))

2×1 DataFrame
│ Row │ x1                 │
├─────┼────────────────────┤
│ 1   │ [0.0, 1.0, 2.0]    │
│ 2   │ [10.0, 15.0, 20.0] │

Thanks you

oheil · September 13, 2018, 9:20am

For me, it is not clear, what the desired result of

collect(Base.product(X))

should be.
With your code it is

julia> collect(Base.product(X))
2-element Array{Tuple{Array{Float64,1}},1}:
 ([0.0, 1.0, 2.0],)
 ([10.0, 15.0, 20.0],)

But maybe it should be

julia> X=collect(Base.product(X...))
3×3 Array{Tuple{Float64,Float64},2}:
 (0.0, 10.0)  (0.0, 15.0)  (0.0, 20.0)
 (1.0, 10.0)  (1.0, 15.0)  (1.0, 20.0)
 (2.0, 10.0)  (2.0, 15.0)  (2.0, 20.0)

and then yields:

julia> X = DataFrame(collect.(collect(zip(X...))))
9×2 DataFrame
│ Row │ x1  │ x2   │
├─────┼─────┼──────┤
│ 1   │ 0.0 │ 10.0 │
│ 2   │ 1.0 │ 10.0 │
│ 3   │ 2.0 │ 10.0 │
│ 4   │ 0.0 │ 15.0 │
│ 5   │ 1.0 │ 15.0 │
│ 6   │ 2.0 │ 15.0 │
│ 7   │ 0.0 │ 20.0 │
│ 8   │ 1.0 │ 20.0 │
│ 9   │ 2.0 │ 20.0 │

I am just guessing here.

DominiqueMakowski · September 13, 2018, 9:22am

perfect guess, that was what I wanted, thanks a lot

oheil · September 13, 2018, 9:25am

I forgot the vec in

julia> X=vec(collect(Base.product(X...)))
9-element Array{Tuple{Float64,Float64},1}:
 (0.0, 10.0)
 (1.0, 10.0)
 (2.0, 10.0)
 (0.0, 15.0)
 (1.0, 15.0)
 (2.0, 15.0)
 (0.0, 20.0)
 (1.0, 20.0)
 (2.0, 20.0)

but end result is the same:

julia> X=DataFrame(collect.(collect(zip(X...))))
9×2 DataFrame
│ Row │ x1  │ x2   │
├─────┼─────┼──────┤
│ 1   │ 0.0 │ 10.0 │
│ 2   │ 1.0 │ 10.0 │
│ 3   │ 2.0 │ 10.0 │
│ 4   │ 0.0 │ 15.0 │
│ 5   │ 1.0 │ 15.0 │
│ 6   │ 2.0 │ 15.0 │
│ 7   │ 0.0 │ 20.0 │
│ 8   │ 1.0 │ 20.0 │
│ 9   │ 2.0 │ 20.0 │

DominiqueMakowski · September 13, 2018, 9:50am

It seems that the last step doesn’t work when they are three variables (it takes forever to compute):

These first steps work as expected:

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))

27-element Array{Tuple{Float64,Float64,Float64},1}:
 (1.0, 5.0, 8.0)
 (2.0, 5.0, 8.0)
 ...
 (2.0, 6.0, 9.0)
 (3.0, 6.0, 9.0)

However, the last line:

grid = DataFrame(collect.(collect(zip(grid...))))

gets stuck, and the problem is apparently related to zip(grid...) that never ends computing…

PS: the desired output would be a 27×3 DataFrame with each element of the tuple as a different column (the same as above but with 3 columns). Hope I am clear enough…

oheil · September 13, 2018, 10:16am

This seems to be a bug in julia 1.0.0

julia 0.6.4:

julia> a=1:30
1:30

julia> collect(zip(a...))
0-dimensional Array{NTuple{30,Int64},0}:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)

julia 1.0.0:

julia> a=1:30
1:30

julia> zip(a...)

takes forever (at least several minutes, I didn’t wait until it comes back to the REPL).

It seems to be a known issue:

https://github.com/JuliaLang/julia/pull/27415

I have commented in:
https://github.com/JuliaLang/julia/issues/26765#issuecomment-421008005

oheil · September 13, 2018, 11:04am

This is how it works nicely in julia 0.6.4:

using DataFrames

function datagrid(X::Array{<:Number}; n::Int=10, kwargs...)
    collect(range(minimum(X), (maximum(X)-minimum(X))/(n-1), n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = getindex.(colwise(x -> datagrid(Array(x), n=3), df),1)
grid = vec(collect(Base.product(grid...)))
grid = DataFrame(collect.(collect(zip(grid...))))

27×3 DataFrames.DataFrame
│ Row │ x1  │ x2  │ x3  │
├─────┼─────┼─────┼─────┤
│ 1   │ 1.0 │ 5.0 │ 8.0 │
│ 2   │ 2.0 │ 5.0 │ 8.0 │
│ 3   │ 3.0 │ 5.0 │ 8.0 │
│ 4   │ 1.0 │ 5.5 │ 8.0 │
│ 5   │ 2.0 │ 5.5 │ 8.0 │
│ 6   │ 3.0 │ 5.5 │ 8.0 │
│ 7   │ 1.0 │ 6.0 │ 8.0 │
│ 8   │ 2.0 │ 6.0 │ 8.0 │
│ 9   │ 3.0 │ 6.0 │ 8.0 │
│ 10  │ 1.0 │ 5.0 │ 8.5 │
│ 11  │ 2.0 │ 5.0 │ 8.5 │
│ 12  │ 3.0 │ 5.0 │ 8.5 │
│ 13  │ 1.0 │ 5.5 │ 8.5 │
│ 14  │ 2.0 │ 5.5 │ 8.5 │
│ 15  │ 3.0 │ 5.5 │ 8.5 │
│ 16  │ 1.0 │ 6.0 │ 8.5 │
│ 17  │ 2.0 │ 6.0 │ 8.5 │
│ 18  │ 3.0 │ 6.0 │ 8.5 │
│ 19  │ 1.0 │ 5.0 │ 9.0 │
│ 20  │ 2.0 │ 5.0 │ 9.0 │
│ 21  │ 3.0 │ 5.0 │ 9.0 │
│ 22  │ 1.0 │ 5.5 │ 9.0 │
│ 23  │ 2.0 │ 5.5 │ 9.0 │
│ 24  │ 3.0 │ 5.5 │ 9.0 │
│ 25  │ 1.0 │ 6.0 │ 9.0 │
│ 26  │ 2.0 │ 6.0 │ 9.0 │
│ 27  │ 3.0 │ 6.0 │ 9.0 │

oheil · September 13, 2018, 11:16am

This is a julia 1.0.0 suggestion avoiding zip, only the last line changed:

using DataFrames

# Function to get a linear range
function datagrid(X::Vector{<:Number}; n::Int=10, kwargs...)
    X = collect(range(minimum(X), stop=maximum(X), length=n))
end

df = DataFrame(y=[1,3], x=[5,6], z=[8,9])
grid = colwise(x -> datagrid(x, n=3), df)
grid = vec(collect(Base.product(grid...)))

grid = DataFrame(collect.([ getindex.(grid,t) for t in 1:length(grid[1]) ]))

27×3 DataFrame
│ Row │ x1  │ x2  │ x3  │
├─────┼─────┼─────┼─────┤
│ 1   │ 1.0 │ 5.0 │ 8.0 │
│ 2   │ 2.0 │ 5.0 │ 8.0 │
│ 3   │ 3.0 │ 5.0 │ 8.0 │
│ 4   │ 1.0 │ 5.5 │ 8.0 │
│ 5   │ 2.0 │ 5.5 │ 8.0 │
│ 6   │ 3.0 │ 5.5 │ 8.0 │
│ 7   │ 1.0 │ 6.0 │ 8.0 │
│ 8   │ 2.0 │ 6.0 │ 8.0 │
│ 9   │ 3.0 │ 6.0 │ 8.0 │
│ 10  │ 1.0 │ 5.0 │ 8.5 │
│ 11  │ 2.0 │ 5.0 │ 8.5 │
│ 12  │ 3.0 │ 5.0 │ 8.5 │
│ 13  │ 1.0 │ 5.5 │ 8.5 │
│ 14  │ 2.0 │ 5.5 │ 8.5 │
│ 15  │ 3.0 │ 5.5 │ 8.5 │
│ 16  │ 1.0 │ 6.0 │ 8.5 │
│ 17  │ 2.0 │ 6.0 │ 8.5 │
│ 18  │ 3.0 │ 6.0 │ 8.5 │
│ 19  │ 1.0 │ 5.0 │ 9.0 │
│ 20  │ 2.0 │ 5.0 │ 9.0 │
│ 21  │ 3.0 │ 5.0 │ 9.0 │
│ 22  │ 1.0 │ 5.5 │ 9.0 │
│ 23  │ 2.0 │ 5.5 │ 9.0 │
│ 24  │ 3.0 │ 5.5 │ 9.0 │
│ 25  │ 1.0 │ 6.0 │ 9.0 │
│ 26  │ 2.0 │ 6.0 │ 9.0 │
│ 27  │ 3.0 │ 6.0 │ 9.0 │

Nosferican · September 13, 2018, 11:17am

This is implemented in QuantEcon

using QuantEcon, DataFrames
julia> DataFrame(gridmake([1,2,3], [4,5,6]))
9×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ 4  │
│ 2   │ 2  │ 4  │
│ 3   │ 3  │ 4  │
│ 4   │ 1  │ 5  │
│ 5   │ 2  │ 5  │
│ 6   │ 3  │ 5  │
│ 7   │ 1  │ 6  │
│ 8   │ 2  │ 6  │
│ 9   │ 3  │ 6  │

QuantEcon is available for ≥ 0.6 (includes 1.0). If you don’t want to use the package, my suggested implementation is

function gridmake(arrays::AbstractVecOrMat...)
    l = size.(arrays, 1)
    nrows = prod(l)
    output = mapreduce(a_o -> repeat(a_o[1],
                                     inner = (a_o[2], 1),
                                     outer = (div(nrows, size(a_o[1], 1) * a_o[2]), 1)),
                       hcat,
                       zip(arrays, cumprod(prepend!(collect(l[1:end - 1]), 1))))
    return output
end

Note: Many of the implementations suggested above only work for AbstractVector and give the wrong result for AbstractMatrix. As a test case you can use,

using QuantEcon: gridmake
using BenchmarkTools: @btime

const x, y, z = 1:3, [10 20; 30 40], [100, 200];

arrays = [x, y, z];

# Here goes the implementation call it magic
magic(arrays...) == gridmake(arrays...) # They match
@btime magic($arrays...) # Check efficiency

sjmgarnier · May 28, 2021, 6:24am

I know this is an old question but in case someone is still looking for a solution that works like the R expand.grid function (i.e. passing a list of named variables of any type and returning a data frame with the variable names as column names, each column of the type of the original variable, and all possible combinations of the different variables), this is my Julia-newbie attempt at it:

using DataFrames

function expand_grid(; iters...)
    var_names = collect(keys(iters))
    var_itr = [1:length(x) for x in iters.data]
    var_ix = vcat([collect(x)' for x in Iterators.product(var_itr...)]...)
    out = DataFrame()
    for i = 1:length(var_names)
        out[:,var_names[i]] = collect(iters[i])[var_ix[:,i]]
    end
    return out
end

expand_grid(a=1:2, b=1.0:5.0, c=["one", "two", "three", "four"])

There is most likely a more efficient or cleaner way to do this but this is the best I could come up with that would give me what I expect from the R function.

Topic		Replies	Views
Map over combinations of parameters, and grouping results as DataFrame General Usage dataframes	10	1406	April 29, 2022
Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language New to Julia question , dataframes	22	702	December 23, 2022
Generate a dataset New to Julia tuple , arrays	4	1074	March 10, 2020
Creating a new column containing DataFrames itself (e.g. from "complex" function output) Data dataframes	2	375	November 20, 2022
DataFrame construction from array of tuples General Usage data	12	7117	November 28, 2022

Function like expand.grid in R

Related topics