# Generate a dataset

I am sorry maybe my title of the problem is probably not so appropriate, but here is the problem. So I have some arrays of categorical or boolean values and I want to create a combination of all of them and form an array. Also It could be the case that for each one of these combinations I have a unique value (in the example it is on the `z` vector). I tried with the following example and it worked. But honestly I didn’t like the way I could do this, is there a simpler way to do this?

Suppose I have

``````x = [false, true];
y = ['A', 'B'];
z = rand(4);
``````

And I want to prodce a two dimensional array like following

``````4×3 Array{Any,2}:
false  'A'  0.961038
true  'A'  0.518147
false  'B'  0.210022
true  'B'  0.543537
``````

Here is what I tried

``````A = [(i, j) for i in x, j in y];
B = [collect(zip(A[i]..., z[i])) for i in 1:length(A)];

C =  reshape([i for i in (B[1]...)], 1, 3);
for j in 2:length(B)
C = vcat(C, reshape([i for i in (B[j]...)], 1, 3))
end
A

``````

You are looking for `Iterators.product`.

``````x = [false, true];
y = ['A', 'B'];
z = rand(4);

df = DataFrame(a = Bool[], b = Char[], c = Float64[])
for p in Iterators.product(x, y, z)
push!(df, p)
end

Matrix(df)
``````
1 Like

Hi @pdeffebach, thanks a lot for the reply. So I need to create a DataFrame, could it be independent of Dataframe? Thanks for the solution. This looks way way simpler than mine. But just tried, the code didn’t work, maybe I don’t know the details of `Iterators.product`. I can check

Sorry, edited the above so it works. It looks like you have to declare the columns first if you are pushing just `Tuple`s instead of `NamedTuples`.

You don’t have to use `DataFrames`, but its a convenient thing to do, plus you said in your post you are interested in “creating a dataset”.

You can do the same process by initializing an arrray of `Any` with 3 columns an 0 rows.

``````julia> df = Array{Any}(undef, 0, 3)
julia> for p in Iterators.product(x, y, z)
global df # because i'm working in REPL I need to declare global in the loop
df = vcat(df, permutedims(collect(p)))
end
``````

Hi Peter, thanks, this works actually, so this is what I wanted,

``````df = Array{Any}(undef, 0, 2)
for p in Iterators.product(x, y)
df = vcat(df, permutedims(collect(p)))
end
hcat(df, z)
``````