Add missing row or column to frequency table

Hi all,

In my simulation, the results are summarized as a 2X2 frequency table. For example:

using DataFrames, FreqTables 
df1 = DataFrame(
    a = [:yes,:yes,:no,:no],
    b = [:yes,:no,:yes,:no]
)
t1 = freqtable(df1, :a, :b)

Output:

2×2 Named Matrix{Int64}
a ╲ b │  :no  :yes
──────┼───────────
:no   │    1     1
:yes  │    1     1

In some cases, the counts are zero for the cells in a row or column. For example:

df2 = DataFrame(
    a = [:yes,:yes,:no,:no],
    b = [:no,:no,:no,:no]
)
t2 = freqtable(df2, :a, :b)

Output

a ╲ b │ :no
──────┼────
:no   │   2
:yes  │   2

Is there an easy way to add a row or column of zeros in such cases?

Thank you.

1 Like

Yes, but you need a and b to be CategoricalArrays

julia> using DataFrames, FreqTables, CategoricalArrays;

julia> x = categorical(["no", "no", "yes"])[1:2]; y = categorical(["up", "down", "up"])[1:2];

julia> df = DataFrame(x = x, y = y);

julia> freqtable(df, :x, :y)
2×2 Named Matrix{Int64}
                                 x ╲ y │ CategoricalValue{String, UInt32} "down"    CategoricalValue{String, UInt32} "up"
───────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────
CategoricalValue{String, UInt32} "no"  │                                       1                                        1
CategoricalValue{String, UInt32} "yes" │                                       0                                        0

With some pretty unfortunate printing, I guess.

Thank you. Unfortunately, that does not work with my example:

df2 = DataFrame(
    a = categorical(["yes","yes","no","no"]),
    b = categorical(["no","no","no","no"])
)
t2 = freqtable(df2, :a, :b)

Output:

2×1 Named Matrix{Int64}
                                 a ╲ b │ CategoricalValue{String, UInt32} "no"
───────────────────────────────────────┼──────────────────────────────────────
CategoricalValue{String, UInt32} "no"  │                                     2
CategoricalValue{String, UInt32} "yes" │                                     2

No, it does not. As you can see above I initially allocated an array with "yes" added then subsetted it, so that "yes" was still in the pool of x.


julia> x = categorical(["no", "no"], levels = ["yes", "no"])
2-element CategoricalArray{String,1,UInt32}:
 "no"
 "no"

julia> freqtable(x)
2-element Named Vector{Int64}
Dim1                                   │ 
───────────────────────────────────────┼──
CategoricalValue{String, UInt32} "yes" │ 0
CategoricalValue{String, UInt32} "no"  │ 2

Please be sure to read the documentation for ? categorical for more details.

1 Like

Thank you. I understand now. It was not clear to me why you subsetted the array. I will post a working solution for my specific question below for future reference. Thanks again.

using DataFrames, FreqTables, CategoricalArrays
df2 = DataFrame(
    a = categorical(["yes","yes","no","no"]; levels=["yes","no"]),
    b = categorical(["no","no","no","no"]; levels=["yes","no"])
)
t2 = freqtable(df2, :a, :b)
1 Like

Does anyone know if this is possible outside of DataFrames? Consider the following:

using FreqTables
julia> g = prop(freqtable([1, 2, 2, 3, 4, 3],[5, 6, 6, 7, 9, 8]), margins=1)

4×5 Named Matrix{Float64}
Dim1 ╲ Dim2 │   5    6    7    8    9
────────────┼────────────────────────
1           │ 1.0  0.0  0.0  0.0  0.0
2           │ 0.0  1.0  0.0  0.0  0.0
3           │ 0.0  0.0  0.5  0.5  0.0
4           │ 0.0  0.0  0.0  0.0  1.0

while what I would like it to return is

9×9 Named Matrix{Float64}
Dim1 ╲ Dim2 │   1    2    3    4    5    6    7    8    9
────────────┼────────────────────────────────────────────
1           │ 0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0
2           │ 0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
3           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.5  0.5  0.0
4           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
5           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
6           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
7           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
8           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9           │ 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

I can write a wrapper function that re-inserts the 0s on the back-end, but was just wondering if another utility already provides this feature.

You can use CategoricalArrays for this


julia> using CategoricalArrays

julia> x_c = categorical(x; levels = 1:9);

julia> y_c = categorical(y; levels = 1:9);

julia> prop(freqtable(x_c, y_c))
9×9 Named Matrix{Float64}
Dim1 ╲ Dim2 │        1         2         3  …         7         8         9
────────────┼──────────────────────────────────────────────────────────────
1           │      0.0       0.0       0.0  …       0.0       0.0       0.0
2           │      0.0       0.0       0.0          0.0       0.0       0.0
3           │      0.0       0.0       0.0     0.166667  0.166667       0.0
4           │      0.0       0.0       0.0          0.0       0.0  0.166667
5           │      0.0       0.0       0.0          0.0       0.0       0.0
6           │      0.0       0.0       0.0          0.0       0.0       0.0
7           │      0.0       0.0       0.0          0.0       0.0       0.0
8           │      0.0       0.0       0.0          0.0       0.0       0.0
9           │      0.0       0.0       0.0  …       0.0       0.0       0.0
1 Like

Thanks! My understanding was that CategoricalArrays only worked with DataFrames but I should have read more before posting my question. I appreciate it!

No. CategoricalArrays.jl is entirely distinct from DataFrames.jl. In fact, it is not even a dependency of DataFrames.jl.

1 Like