I have a dataframe with three categorical variables a,b,c. taking values in 1:2, 1:5, and 1:7 respectively. Then I calculate the number rows with each of the possible 2*5*7=70 combinations of a,b,c:
Now I want to create 3d 2*5*7 array called frequency such that frequency[i,j,k] is equal to the value of count in the row corresponding to a=i,b=j,c=k.
How can I do this? I suspect the unstack function may help but I don’t understand that function well.
Thanks but I’m not seeing how to access the frequency at particular indices. Suppose for I want to know the frequency with which a=1,b=2,c=3 occurs. I guess I could convert the dataframe into a matrix and then loop over every row in the matrix and say if matrix[ _,1] = 1 && matrix[ _,2]=2, && matrix[ _,3] = 3, then give me matrix[_,4]. But I figured there would be a more systematic way.
This is is a commonly requested feature in DataFrames. There is no way to index columns for an easy lookup like that.
You can make a GroupedDataFrame grouping on [:a, :b, :c] and then index like
gd[(a = 1, b = 2, c = 3)]
This will be fast since a GroupedDataFrame creates a hash for lookup just like a Dict.
However this will return a SubDataFrame, which is itself not the nicest object.
I would do the following
julia> gd = groupby(df_fraction, [:a, :b, :c]);
julia> function get_fraction(gd, ;a = nothing, b = nothing, c = nothing)
only(gd[(;a, b, c)]).fraction
end
get_fraction (generic function with 1 method)
julia> get_fraction(gd; a = 1, b = 2, c = 3)
0.018