I received the above question in an email.
Here is my answer:
using MLJ # or MLJBase is enough
julia> y1 = categorical(rand("abc", 10)) # or `coerce(rand("abc", 10), Multiclass))`
10-element CategoricalArray{Char,1,UInt32}:
'b'
'a'
'a'
'a'
'c'
'b'
'c'
'c'
'c'
'b'
julia> y2 = categorical(rand("abc", 10))
10-element CategoricalArray{Char,1,UInt32}:
'b'
'c'
'c'
'c'
'a'
'b'
'b'
'b'
'a'
'c'
julia> measures("FScore")
2-element Vector{NamedTuple{(:name, :instances, :human_name, :target_scitype, :supports_weights, :supports_class_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type), T} where T<:Tuple}:
(name = FScore, instances = [f1score], ...)
(name = MulticlassFScore, instances = [macro_f1score, micro_f1score, multiclass_f1score], ...)
julia> m = MulticlassFScore()
MulticlassFScore(β = 1.0,average = MLJBase.MacroAvg(),return_type = LittleDict)
julia> m(y1, y2)
0.19047619047619047
julia> macro_f1score(y1, y2)
0.19047619047619047
Here’s the docstring:
help?> MulticlassFScore
search: MulticlassFScore multiclass_f1score MulticlassFalseDiscoveryRate
MulticlassFScore(; β=1.0, average=macro_avg, return_type=LittleDict)
One-parameter generalization, F_β, of the F-measure or balanced F-score for multiclass
observations.
MulticlassFScore()(ŷ, y)
MulticlassFScore()(ŷ, y, class_w)
Evaluate the default score on multiclass observations, ŷ, given ground truth values, y.
Options for average are: no_avg, macro_avg (default) and micro_avg. Options for
return_type, applying in the no_avg case, are: LittleDict (default) or Vector. An
optional AbstractDict, denoted class_w above, keyed on levels(y), specifies class
weights. It applies if average=macro_avg or average=no_avg.
For more information, run info(MulticlassFScore).
To get the weighted F1 score instead of micro or macro averages, we use the class_w with class_w[i] being the proportion of the i-th class in the dataset, right? Or is there a parameter for that already e.g. like micro_avg
I believe micro averages (average=micro_avg
) take into account the relative proportions of the classes, in the sense that the score can be high, even if performance for a rare class is poor. So no need to manually compute and pass class_w
for that (only average=macro_avg
and average=no_avg
support user-specified class weights). Is that what you are after?
@ven-k may want to clarify.
The implementation details are approximately here.
If you want to access the per class weighted FScore in the example above by @ablaom,
julia> m_no_avg = MulticlassFScore(average=no_avg)
MulticlassFScore(β = 1.0,average = MLJBase.NoAvg(),return_type = LittleDict)
julia> class_w = LittleDict('a' => 0.1, 'b' => 0.4, 'c' => 0.5)
LittleDict{Char, Float64, Vector{Char}, Vector{Float64}} with 3 entries:
'a' => 0.1
'b' => 0.4
'c' => 0.5
julia> f1_no_avg = m_no_avg(y1, y2, class_w)
LittleDict{String, Float64, Vector{String}, Vector{Float64}} with 3 entries:
"a" => 0.06
"b" => 0.0
"c" => 0.0
julia> m(y1, y2, class_w) # By default, macro_avg is used and notice that its same as mean(f1_no_avg)
0.02
MLJBase considers this family of multiclass scores to be of:
- micro_avg → M(ulticlass)TP, MTN… are computed across classes and then the MRecall, MFScore… are computed
- macro_avg → MTP, MTN… are computed per class and MRecall, MFScore… are also computed per class and averaged value is returned
- no_avg → MTP, MTN… are computed per class and per class MRecall, MFScore are returned
Further the weighted scores are supported for both average=macro_avg
and average=no_avg
to suit our broader design of package and stay flexible (most scores return vector, as it’s convenient to check per-class values and apply aggregation if necessary).
As class info is not available with micro_avg
, it’s promoted to macro_avg
whenever weights are passed.
Hi everyone. I might be late here but may I ask why I keep getting error when using f1score
to compute the F1-score for two literally string vector, for example ["Yes", "No", "Yes", "Yes", "No"]
? Here is the exact type of my vectors.
CategoricalVector{String15, UInt32, String15, CategoricalValue{String15, UInt32}, Union{}} (alias for CategoricalArrays.CategoricalArray{String15, 1, UInt32, String15, CategoricalArrays.CategoricalValue{String15, UInt32}, Union{}})
And here is the error:
ERROR: MethodError: no method matching _check(::FScore{Float64}, ::MLJBase.ConfusionMatrixObject{3})
Closest candidates are:
_check(::MLJBase.Measure, ::Any, ::Any, ::AbstractArray) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:67
_check(::MLJBase.Measure, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:59
_check(::MLJBase.Measure, ::Any, ::Any, ::Any) at C:\Users\tran_\.julia\packages\MLJBase\CtxrQ\src\measures\measures.jl:63
...
Thanks for posting.
It would be helpful if you could provide a self contained code snippet but here’s my guess: It seems like you have more than two classes in the pool of your vector(s) (3 to be precise) which means you’ll want to use MulticlassFScore
not FScore
as you appear to be doing. So this works fine for me:
using MLJBase
using Random
y = coerce(["yes", "no", "yes", "maybe", "maybe"], Multiclass);
yhat = y[randperm(5)];
multiclass_f1score(yhat, y) # `multiclass_f1score` is alias for `MulticlassFScore()`
0.16666666666666666
That said, it looks like you should have got a more informative errors when trying to use FScore
on non-binary data. Be good if you can give a complete minimal demonstration of your error.
Yes, you’re correct. I’ve checked my data again and indeed there are mose than just two classes in my data. I was so careless! Thank you for your answer