# How do I compute a multiclass F-score using MLJ?

I received the above question in an email.

1 Like

Here is my answer:

``````using MLJ # or MLJBase is enough

julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
'b'
'a'
'a'
'a'
'c'
'b'
'c'
'c'
'c'
'b'

julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
'b'
'c'
'c'
'c'
'a'
'b'
'b'
'b'
'a'
'c'

julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
(name = FScore, instances = [f1score], ...)
(name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)

julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)

julia> m(y1, y2)
0.19047619047619047

julia> macro_f1score(y1, y2)
0.19047619047619047
``````

Here’s the docstring:

``````help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate

MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)

One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
observations.

MulticlassFScore()(ŷ, y)
MulticlassFScore()(ŷ, y, class_w)

Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
weights. It applies if average=macro_avg or average=no_avg.

``````
3 Likes

To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg

I believe micro averages (`average=micro_avg`) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass `class_w` for that (only `average=macro_avg` and `average=no_avg` support user-specified class weights). Is that what you are after?

@ven-k may want to clarify.

The implementation details are approximately here.

1 Like

If you want to access the per class weighted FScore in the example above by @ablaom,

``````julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)

julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
'a' => 0.1
'b' => 0.4
'c' => 0.5

julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
"a" => 0.06
"b" => 0.0
"c" => 0.0

julia> m(y1, y2, class_w)  # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02
``````

MLJBase considers this family of multiclass scores to be of:

• micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
• macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
• no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned

Further the weighted scores are supported for both `average=macro_avg` and `average=no_avg` to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).

As class info is not available with `micro_avg`, it’s promoted to `macro_avg` whenever weights are passed.

2 Likes

Hi everyone. I might be late here but may I ask why I keep getting error when using `f1score` to compute the F1-score for two literally string vector, for example `["Yes", "No", "Yes", "Yes", "No"]`? Here is the exact type of my vectors.

``````CategoricalVector{String15, UInt32, String15, CategoricalValue{String15, UInt32}, Union{}} (alias for CategoricalArrays.CategoricalArray{String15, 1, UInt32, String15, CategoricalArrays.CategoricalValue{String15, UInt32}, Union{}})
``````

And here is the error:

``````ERROR: MethodError: no method matching _check(::FScore{Float64}, ::MLJBase.ConfusionMatrixObject{3})
Closest candidates are:
_check(::MLJBase.Measure, ::Any, ::Any, ::AbstractArray) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:67
_check(::MLJBase.Measure, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:59
_check(::MLJBase.Measure, ::Any, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:63
...
``````

Thanks for posting.

It would be helpful if you could provide a self contained code snippet but here’s my guess: It seems like you have more than two classes in the pool of your vector(s) (3 to be precise) which means you’ll want to use `MulticlassFScore` not `FScore` as you appear to be doing. So this works fine for me:

``````using MLJBase
using Random
y = coerce(["yes", "no", "yes", "maybe", "maybe"], Multiclass);
yhat = y[randperm(5)];

multiclass_f1score(yhat, y)    # `multiclass_f1score` is alias for `MulticlassFScore()`
0.16666666666666666
``````

That said, it looks like you should have got a more informative errors when trying to use `FScore` on non-binary data. Be good if you can give a complete minimal demonstration of your error.

1 Like

Yes, you’re correct. I’ve checked my data again and indeed there are mose than just two classes in my data. I was so careless! Thank you for your answer 