I received the above question in an email.

Here is my answer:

```
using MLJ # or MLJBase is enough
julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
'b'
'a'
'a'
'a'
'c'
'b'
'c'
'c'
'c'
'b'
julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
'b'
'c'
'c'
'c'
'a'
'b'
'b'
'b'
'a'
'c'
julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
(name = FScore, instances = [f1score], ...)
(name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)
julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)
julia> m(y1, y2)
0.19047619047619047
julia> macro_f1score(y1, y2)
0.19047619047619047
```

Here’s the docstring:

```
help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate
MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)
One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
observations.
MulticlassFScore()(ŷ, y)
MulticlassFScore()(ŷ, y, class_w)
Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
weights. It applies if average=macro_avg or average=no_avg.
For more information, run info(MulticlassFScore).
```

To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg

I believe micro averages (`average=micro_avg`

) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass `class_w`

for that (only `average=macro_avg`

and `average=no_avg`

support user-specified class weights). Is that what you are after?

@ven-k may want to clarify.

The implementation details are approximately here.

If you want to access the per class weighted FScore in the example above by @ablaom,

```
julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)
julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
'a' => 0.1
'b' => 0.4
'c' => 0.5
julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
"a" => 0.06
"b" => 0.0
"c" => 0.0
julia> m(y1, y2, class_w) # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02
```

MLJBase considers this family of multiclass scores to be of:

- micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
- macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
- no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned

Further the weighted scores are supported for both `average=macro_avg`

and `average=no_avg`

to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).

As class info is not available with `micro_avg`

, it’s promoted to `macro_avg`

whenever weights are passed.

Hi everyone. I might be late here but may I ask why I keep getting error when using `f1score`

to compute the F1-score for two literally string vector, for example `["Yes", "No", "Yes", "Yes", "No"]`

? Here is the exact type of my vectors.

```
CategoricalVector{String15, UInt32, String15, CategoricalValue{String15, UInt32}, Union{}} (alias for CategoricalArrays.CategoricalArray{String15, 1, UInt32, String15, CategoricalArrays.CategoricalValue{String15, UInt32}, Union{}})
```

And here is the error:

```
ERROR: MethodError: no method matching _check(::FScore{Float64}, ::MLJBase.ConfusionMatrixObject{3})
Closest candidates are:
_check(::MLJBase.Measure, ::Any, ::Any, ::AbstractArray) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:67
_check(::MLJBase.Measure, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:59
_check(::MLJBase.Measure, ::Any, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:63
...
```

Thanks for posting.

It would be helpful if you could provide a self contained code snippet but here’s my guess: It seems like you have more than two classes in the pool of your vector(s) (3 to be precise) which means you’ll want to use `MulticlassFScore`

not `FScore`

as you appear to be doing. So this works fine for me:

```
using MLJBase
using Random
y = coerce(["yes", "no", "yes", "maybe", "maybe"], Multiclass);
yhat = y[randperm(5)];
multiclass_f1score(yhat, y) # `multiclass_f1score` is alias for `MulticlassFScore()`
0.16666666666666666
```

That said, it looks like you should have got a more informative errors when trying to use `FScore`

on non-binary data. Be good if you can give a complete minimal demonstration of your error.

Yes, you’re correct. I’ve checked my data again and indeed there are mose than just two classes in my data. I was so careless! Thank you for your answer