Hi all,
In my simulation, the results are summarized as a 2X2 frequency table. For example:
using DataFrames, FreqTables
df1 = DataFrame(
a = [:yes,:yes,:no,:no],
b = [:yes,:no,:yes,:no]
)
t1 = freqtable(df1, :a, :b)
Output:
2×2 Named Matrix{Int64}
a ╲ b │ :no :yes
──────┼───────────
:no │ 1 1
:yes │ 1 1
In some cases, the counts are zero for the cells in a row or column. For example:
df2 = DataFrame(
a = [:yes,:yes,:no,:no],
b = [:no,:no,:no,:no]
)
t2 = freqtable(df2, :a, :b)
Output
a ╲ b │ :no
──────┼────
:no │ 2
:yes │ 2
Is there an easy way to add a row or column of zeros in such cases?
Thank you.
1 Like
Yes, but you need a
and b
to be CategoricalArrays
julia> using DataFrames, FreqTables, CategoricalArrays;
julia> x = categorical(["no", "no", "yes"])[1:2]; y = categorical(["up", "down", "up"])[1:2];
julia> df = DataFrame(x = x, y = y);
julia> freqtable(df, :x, :y)
2×2 Named Matrix{Int64}
x ╲ y │ CategoricalValue{String, UInt32} "down" CategoricalValue{String, UInt32} "up"
───────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────
CategoricalValue{String, UInt32} "no" │ 1 1
CategoricalValue{String, UInt32} "yes" │ 0 0
With some pretty unfortunate printing, I guess.
Thank you. Unfortunately, that does not work with my example:
df2 = DataFrame(
a = categorical(["yes","yes","no","no"]),
b = categorical(["no","no","no","no"])
)
t2 = freqtable(df2, :a, :b)
Output:
2×1 Named Matrix{Int64}
a ╲ b │ CategoricalValue{String, UInt32} "no"
───────────────────────────────────────┼──────────────────────────────────────
CategoricalValue{String, UInt32} "no" │ 2
CategoricalValue{String, UInt32} "yes" │ 2
No, it does not. As you can see above I initially allocated an array with "yes"
added then subsetted it, so that "yes"
was still in the pool of x
.
julia> x = categorical(["no", "no"], levels = ["yes", "no"])
2-element CategoricalArray{String,1,UInt32}:
"no"
"no"
julia> freqtable(x)
2-element Named Vector{Int64}
Dim1 │
───────────────────────────────────────┼──
CategoricalValue{String, UInt32} "yes" │ 0
CategoricalValue{String, UInt32} "no" │ 2
Please be sure to read the documentation for ? categorical
for more details.
1 Like
Thank you. I understand now. It was not clear to me why you subsetted the array. I will post a working solution for my specific question below for future reference. Thanks again.
using DataFrames, FreqTables, CategoricalArrays
df2 = DataFrame(
a = categorical(["yes","yes","no","no"]; levels=["yes","no"]),
b = categorical(["no","no","no","no"]; levels=["yes","no"])
)
t2 = freqtable(df2, :a, :b)
1 Like
Does anyone know if this is possible outside of DataFrames
? Consider the following:
using FreqTables
julia> g = prop(freqtable([1, 2, 2, 3, 4, 3],[5, 6, 6, 7, 9, 8]), margins=1)
4×5 Named Matrix{Float64}
Dim1 ╲ Dim2 │ 5 6 7 8 9
────────────┼────────────────────────
1 │ 1.0 0.0 0.0 0.0 0.0
2 │ 0.0 1.0 0.0 0.0 0.0
3 │ 0.0 0.0 0.5 0.5 0.0
4 │ 0.0 0.0 0.0 0.0 1.0
while what I would like it to return is
9×9 Named Matrix{Float64}
Dim1 ╲ Dim2 │ 1 2 3 4 5 6 7 8 9
────────────┼────────────────────────────────────────────
1 │ 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
2 │ 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
3 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0
4 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
5 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
8 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9 │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I can write a wrapper function that re-inserts the 0s on the back-end, but was just wondering if another utility already provides this feature.
You can use CategoricalArrays
for this
julia> using CategoricalArrays
julia> x_c = categorical(x; levels = 1:9);
julia> y_c = categorical(y; levels = 1:9);
julia> prop(freqtable(x_c, y_c))
9×9 Named Matrix{Float64}
Dim1 ╲ Dim2 │ 1 2 3 … 7 8 9
────────────┼──────────────────────────────────────────────────────────────
1 │ 0.0 0.0 0.0 … 0.0 0.0 0.0
2 │ 0.0 0.0 0.0 0.0 0.0 0.0
3 │ 0.0 0.0 0.0 0.166667 0.166667 0.0
4 │ 0.0 0.0 0.0 0.0 0.0 0.166667
5 │ 0.0 0.0 0.0 0.0 0.0 0.0
6 │ 0.0 0.0 0.0 0.0 0.0 0.0
7 │ 0.0 0.0 0.0 0.0 0.0 0.0
8 │ 0.0 0.0 0.0 0.0 0.0 0.0
9 │ 0.0 0.0 0.0 … 0.0 0.0 0.0
1 Like
Thanks! My understanding was that CategoricalArrays
only worked with DataFrames
but I should have read more before posting my question. I appreciate it!
No. CategoricalArrays.jl is entirely distinct from DataFrames.jl. In fact, it is not even a dependency of DataFrames.jl.
1 Like