# Creating a 3d frequency array for categorical variables from a dataframe

I have a dataframe with three categorical variables `a,b,c`. taking values in 1:2, 1:5, and 1:7 respectively. Then I calculate the number rows with each of the possible 2*5*7=70 combinations of a,b,c:

``````df = transform(groupby(df, [:a,:b,:c]), nrow => :count)
df = unique(df[:, [:a, :b, :c, :count])
``````

Now I want to create 3d 2*5*7 array called `frequency` such that frequency[i,j,k] is equal to the value of `count` in the row corresponding to a=i,b=j,c=k.

How can I do this? I suspect the `unstack` function may help but I don’t understand that function well.

Here’s what I would do using DataFramesMeta and Chain.jl

``````julia> using Chain, DataFramesMeta;

julia> df = DataFrame(a = rand(1:2, N), b = rand(1:5, N), c = rand(1:7, N));

julia> df_count = @chain df begin
groupby([:a, :b, :c])
@combine(count = length(:a))
end;

julia> combinations = reshape(collect(Iterators.product(1:2, 1:5, 1:7)), :);

julia> df_full = @chain combinations begin
DataFrame
rename!([:a, :b, :c])
end;

julia> df_fraction = @chain df_full begin
leftjoin(df_count, on = [:a, :b, :c])
@transform(count = coalesce(:count, 0))
@transform(fraction = :count ./ sum(:count))
end;
``````

Thanks but I’m not seeing how to access the frequency at particular indices. Suppose for I want to know the frequency with which `a=1,b=2,c=3` occurs. I guess I could convert the dataframe into a matrix and then loop over every row in the matrix and say `if matrix[ _,1] = 1 && matrix[ _,2]=2, && matrix[ _,3] = 3`, then give me `matrix[_,4]`. But I figured there would be a more systematic way.

Ah yes I see now.

This is is a commonly requested feature in DataFrames. There is no way to index columns for an easy lookup like that.

You can make a `GroupedDataFrame` grouping on `[:a, :b, :c]` and then index like

``````gd[(a = 1, b = 2, c = 3)]
``````

This will be fast since a `GroupedDataFrame` creates a hash for lookup just like a `Dict`.

However this will return a `SubDataFrame`, which is itself not the nicest object.

I would do the following

``````julia> gd = groupby(df_fraction, [:a, :b, :c]);

julia> function get_fraction(gd, ;a = nothing, b = nothing, c = nothing)
only(gd[(;a, b, c)]).fraction
end
get_fraction (generic function with 1 method)

julia> get_fraction(gd; a = 1, b = 2, c = 3)
0.018
``````
2 Likes