I received the above question in an email.
Here is my answer:
using MLJ # or MLJBase is enough
julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
'b'
'a'
'a'
'a'
'c'
'b'
'c'
'c'
'c'
'b'
julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
'b'
'c'
'c'
'c'
'a'
'b'
'b'
'b'
'a'
'c'
julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
(name = FScore, instances = [f1score], ...)
(name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)
julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)
julia> m(y1, y2)
0.19047619047619047
julia> macro_f1score(y1, y2)
0.19047619047619047
Here’s the docstring:
help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate
MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)
One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
observations.
MulticlassFScore()(ŷ, y)
MulticlassFScore()(ŷ, y, class_w)
Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
weights. It applies if average=macro_avg or average=no_avg.
For more information, run info(MulticlassFScore).
To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg
I believe micro averages (average=micro_avg
) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass class_w
for that (only average=macro_avg
and average=no_avg
support user-specified class weights). Is that what you are after?
@ven-k may want to clarify.
The implementation details are approximately here.
If you want to access the per class weighted FScore in the example above by @ablaom,
julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)
julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
'a' => 0.1
'b' => 0.4
'c' => 0.5
julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
"a" => 0.06
"b" => 0.0
"c" => 0.0
julia> m(y1, y2, class_w) # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02
MLJBase considers this family of multiclass scores to be of:
- micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
- macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
- no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned
Further the weighted scores are supported for both average=macro_avg
and average=no_avg
to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).
As class info is not available with micro_avg
, it’s promoted to macro_avg
whenever weights are passed.