Generate a dataset

I am sorry maybe my title of the problem is probably not so appropriate, but here is the problem. So I have some arrays of categorical or boolean values and I want to create a combination of all of them and form an array. Also It could be the case that for each one of these combinations I have a unique value (in the example it is on the z vector). I tried with the following example and it worked. But honestly I didn’t like the way I could do this, is there a simpler way to do this?

Suppose I have

x = [false, true];
y = ['A', 'B'];
z = rand(4);

And I want to prodce a two dimensional array like following

4×3 Array{Any,2}:
 false  'A'  0.961038
  true  'A'  0.518147
 false  'B'  0.210022
  true  'B'  0.543537

Here is what I tried

A = [(i, j) for i in x, j in y];
B = [collect(zip(A[i]..., z[i])) for i in 1:length(A)];

C =  reshape([i for i in (B[1]...)], 1, 3);
for j in 2:length(B)
    C = vcat(C, reshape([i for i in (B[j]...)], 1, 3))
end
A

You are looking for Iterators.product.

x = [false, true];
y = ['A', 'B'];
z = rand(4);

df = DataFrame(a = Bool[], b = Char[], c = Float64[])
for p in Iterators.product(x, y, z)
    push!(df, p)
end

Matrix(df)
1 Like

Hi @pdeffebach, thanks a lot for the reply. So I need to create a DataFrame, could it be independent of Dataframe? Thanks for the solution. This looks way way simpler than mine. But just tried, the code didn’t work, maybe I don’t know the details of Iterators.product. I can check

Sorry, edited the above so it works. It looks like you have to declare the columns first if you are pushing just Tuples instead of NamedTuples.

You don’t have to use DataFrames, but its a convenient thing to do, plus you said in your post you are interested in “creating a dataset”.

You can do the same process by initializing an arrray of Any with 3 columns an 0 rows.

julia> df = Array{Any}(undef, 0, 3)
julia> for p in Iterators.product(x, y, z)
           global df # because i'm working in REPL I need to declare global in the loop    
           df = vcat(df, permutedims(collect(p)))
       end

Hi Peter, thanks, this works actually, so this is what I wanted,

df = Array{Any}(undef, 0, 2)
for p in Iterators.product(x, y)
    df = vcat(df, permutedims(collect(p)))
end
hcat(df, z)