How do I compute a multiclass F-score using MLJ?

I received the above question in an email.

1 Like

Here is my answer:

using MLJ # or MLJBase is enough

julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
 'b'
 'a'
 'a'
 'a'
 'c'
 'b'
 'c'
 'c'
 'c'
 'b'

julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
 'b'
 'c'
 'c'
 'c'
 'a'
 'b'
 'b'
 'b'
 'a'
 'c'

julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
 (name = FScore, instances = [f1score], ...)
 (name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)

julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)

julia> m(y1, y2)
0.19047619047619047

julia> macro_f1score(y1, y2)
0.19047619047619047

Here’s the docstring:

help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate

  MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)

  One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
  observations.

  MulticlassFScore()(ŷ, y)
  MulticlassFScore()(ŷ, y, class_w)

  Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
  Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
  return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
  optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
  weights. It applies if average=macro_avg or average=no_avg.

  For more information, run info(MulticlassFScore).
3 Likes

To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg

I believe micro averages (average=micro_avg) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass class_w for that (only average=macro_avg and average=no_avg support user-specified class weights). Is that what you are after?

@ven-k may want to clarify.

The implementation details are approximately here.

1 Like

If you want to access the per class weighted FScore in the example above by @ablaom,

julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)

julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
  'a' => 0.1
  'b' => 0.4
  'c' => 0.5

julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
  "a" => 0.06
  "b" => 0.0
  "c" => 0.0

julia> m(y1, y2, class_w)  # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02

MLJBase considers this family of multiclass scores to be of:

  • micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
  • macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
  • no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned

Further the weighted scores are supported for both average=macro_avg and average=no_avg to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).

As class info is not available with micro_avg, it’s promoted to macro_avg whenever weights are passed.

2 Likes

Hi everyone. I might be late here but may I ask why I keep getting error when using f1score to compute the F1-score for two literally string vector, for example ["Yes", "No", "Yes", "Yes", "No"]? Here is the exact type of my vectors.

CategoricalVector{String15, UInt32, String15, CategoricalValue{String15, UInt32}, Union{}} (alias for CategoricalArrays.CategoricalArray{String15, 1, UInt32, String15, CategoricalArrays.CategoricalValue{String15, UInt32}, Union{}})

And here is the error:

ERROR: MethodError: no method matching _check(::FScore{Float64}, ::MLJBase.ConfusionMatrixObject{3})
Closest candidates are:
  _check(::MLJBase.Measure, ::Any, ::Any, ::AbstractArray) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:67
  _check(::MLJBase.Measure, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:59
  _check(::MLJBase.Measure, ::Any, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:63
  ...

Thanks for posting.

It would be helpful if you could provide a self contained code snippet but here’s my guess: It seems like you have more than two classes in the pool of your vector(s) (3 to be precise) which means you’ll want to use MulticlassFScore not FScore as you appear to be doing. So this works fine for me:

using MLJBase
using Random
y = coerce(["yes", "no", "yes", "maybe", "maybe"], Multiclass);
yhat = y[randperm(5)];

multiclass_f1score(yhat, y)    # `multiclass_f1score` is alias for `MulticlassFScore()`
0.16666666666666666

That said, it looks like you should have got a more informative errors when trying to use FScore on non-binary data. Be good if you can give a complete minimal demonstration of your error.

1 Like

Yes, you’re correct. I’ve checked my data again and indeed there are mose than just two classes in my data. I was so careless! Thank you for your answer :heart: