Add missing row or column to frequency table

Hi all,

In my simulation, the results are summarized as a 2X2 frequency table. For example:

using DataFrames, FreqTables 
df1 = DataFrame(
    a = [:yes,:yes,:no,:no],
    b = [:yes,:no,:yes,:no]
)
t1 = freqtable(df1, :a, :b)

Output:

2×2 Named Matrix{Int64}
a ╲ b │  :no  :yes
──────┼───────────
:no   │    1     1
:yes  │    1     1

In some cases, the counts are zero for the cells in a row or column. For example:

df2 = DataFrame(
    a = [:yes,:yes,:no,:no],
    b = [:no,:no,:no,:no]
)
t2 = freqtable(df2, :a, :b)

Output

a ╲ b │ :no
──────┼────
:no   │   2
:yes  │   2

Is there an easy way to add a row or column of zeros in such cases?

Thank you.

Yes, but you need a and b to be CategoricalArrays

julia> using DataFrames, FreqTables, CategoricalArrays;

julia> x = categorical(["no", "no", "yes"])[1:2]; y = categorical(["up", "down", "up"])[1:2];

julia> df = DataFrame(x = x, y = y);

julia> freqtable(df, :x, :y)
2×2 Named Matrix{Int64}
                                 x ╲ y │ CategoricalValue{String, UInt32} "down"    CategoricalValue{String, UInt32} "up"
───────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────
CategoricalValue{String, UInt32} "no"  │                                       1                                        1
CategoricalValue{String, UInt32} "yes" │                                       0                                        0

With some pretty unfortunate printing, I guess.

Thank you. Unfortunately, that does not work with my example:

df2 = DataFrame(
    a = categorical(["yes","yes","no","no"]),
    b = categorical(["no","no","no","no"])
)
t2 = freqtable(df2, :a, :b)

Output:

2×1 Named Matrix{Int64}
                                 a ╲ b │ CategoricalValue{String, UInt32} "no"
───────────────────────────────────────┼──────────────────────────────────────
CategoricalValue{String, UInt32} "no"  │                                     2
CategoricalValue{String, UInt32} "yes" │                                     2

No, it does not. As you can see above I initially allocated an array with "yes" added then subsetted it, so that "yes" was still in the pool of x.


julia> x = categorical(["no", "no"], levels = ["yes", "no"])
2-element CategoricalArray{String,1,UInt32}:
 "no"
 "no"

julia> freqtable(x)
2-element Named Vector{Int64}
Dim1                                   │ 
───────────────────────────────────────┼──
CategoricalValue{String, UInt32} "yes" │ 0
CategoricalValue{String, UInt32} "no"  │ 2

Please be sure to read the documentation for ? categorical for more details.

1 Like

Thank you. I understand now. It was not clear to me why you subsetted the array. I will post a working solution for my specific question below for future reference. Thanks again.

using DataFrames, FreqTables, CategoricalArrays
df2 = DataFrame(
    a = categorical(["yes","yes","no","no"]; levels=["yes","no"]),
    b = categorical(["no","no","no","no"]; levels=["yes","no"])
)
t2 = freqtable(df2, :a, :b)
1 Like