How do I compute a multiclass F-score using MLJ?

ablaom · March 28, 2022, 12:53am

I received the above question in an email.

ablaom · March 28, 2022, 12:55am

Here is my answer:

using MLJ # or MLJBase is enough

julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
 'b'
 'a'
 'a'
 'a'
 'c'
 'b'
 'c'
 'c'
 'c'
 'b'

julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
 'b'
 'c'
 'c'
 'c'
 'a'
 'b'
 'b'
 'b'
 'a'
 'c'

julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
 (name = FScore, instances = [f1score], ...)
 (name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)

julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)

julia> m(y1, y2)
0.19047619047619047

julia> macro_f1score(y1, y2)
0.19047619047619047

Here’s the docstring:

help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate

  MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)

  One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
  observations.

  MulticlassFScore()(ŷ, y)
  MulticlassFScore()(ŷ, y, class_w)

  Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
  Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
  return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
  optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
  weights. It applies if average=macro_avg or average=no_avg.

  For more information, run info(MulticlassFScore).

Jaidy · March 28, 2022, 5:59pm

To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg

ablaom · April 1, 2022, 3:18am

I believe micro averages (average=micro_avg) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass class_w for that (only average=macro_avg and average=no_avg support user-specified class weights). Is that what you are after?

@ven-k may want to clarify.

The implementation details are approximately here.

ven-k · April 2, 2022, 3:58pm

If you want to access the per class weighted FScore in the example above by @ablaom,

julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)

julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
  'a' => 0.1
  'b' => 0.4
  'c' => 0.5

julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
  "a" => 0.06
  "b" => 0.0
  "c" => 0.0

julia> m(y1, y2, class_w)  # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02

MLJBase considers this family of multiclass scores to be of:

micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned

Further the weighted scores are supported for both average=macro_avg and average=no_avg to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).

As class info is not available with micro_avg, it’s promoted to macro_avg whenever weights are passed.

nvhoang.3110 · September 22, 2022, 10:41pm

Hi everyone. I might be late here but may I ask why I keep getting error when using f1score to compute the F1-score for two literally string vector, for example ["Yes", "No", "Yes", "Yes", "No"]? Here is the exact type of my vectors.

CategoricalVector{String15, UInt32, String15, CategoricalValue{String15, UInt32}, Union{}} (alias for CategoricalArrays.CategoricalArray{String15, 1, UInt32, String15, CategoricalArrays.CategoricalValue{String15, UInt32}, Union{}})

And here is the error:

ERROR: MethodError: no method matching _check(::FScore{Float64}, ::MLJBase.ConfusionMatrixObject{3})
Closest candidates are:
  _check(::MLJBase.Measure, ::Any, ::Any, ::AbstractArray) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:67
  _check(::MLJBase.Measure, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:59
  _check(::MLJBase.Measure, ::Any, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:63
  ...

ablaom · September 23, 2022, 1:05am

Thanks for posting.

It would be helpful if you could provide a self contained code snippet but here’s my guess: It seems like you have more than two classes in the pool of your vector(s) (3 to be precise) which means you’ll want to use MulticlassFScore not FScore as you appear to be doing. So this works fine for me:

using MLJBase
using Random
y = coerce(["yes", "no", "yes", "maybe", "maybe"], Multiclass);
yhat = y[randperm(5)];

multiclass_f1score(yhat, y)    # `multiclass_f1score` is alias for `MulticlassFScore()`
0.16666666666666666

That said, it looks like you should have got a more informative errors when trying to use FScore on non-binary data. Be good if you can give a complete minimal demonstration of your error.

nvhoang.3110 · September 23, 2022, 7:06am

Yes, you’re correct. I’ve checked my data again and indeed there are mose than just two classes in my data. I was so careless! Thank you for your answer

Topic		Replies	Views
Using measure in MLJ to evaluate binary classifier New to Julia machine-learning , mlj	2	1498	August 31, 2021
MLJ confusion_matrix() - MethodError Machine Learning question , package	5	1277	September 18, 2020
MLJ - A machine learning toolbox for Julia Package Announcements	0	2200	April 30, 2019
Function to Calculate Kappa Machine Learning machine-learning , mlj	5	672	January 5, 2022
Problem using MLJ and Multiclass scitypes Machine Learning	4	891	November 2, 2020

How do I compute a multiclass F-score using MLJ?

Related topics