Extracting values from UnivariateFinite

So MLJ gave me, as its output, a UnivariateFinite data structure; I have:

UnivariateFinite{OrderedFactor{2}}(0=>0.937, 1=>0.0626)

Is there a way I can extract, as an array, the probabilities from this distribution (other than manually copying and pasting)?

Maybe:

julia> x = UnivariateFinite([0, 1], [0.9, 0.1])
┌ Warning: No `CategoricalValue` found from which to extract a complete pool of classes. Creating a new pool (ordered=false). You can:
│  (i) specify `pool=missing` to suppress this warning; or
│  (ii) use an existing pool by specifying `pool=c` where `c` is a `CategoricalArray`, `CategoricalValue` or CategoricalPool`.
│ In case (i) specify `ordered=true` if samples are to be `OrderedFactor`. 
└ @ MLJBase ~/.julia/packages/MLJBase/AkJde/src/univariate_finite/types.jl:262
UnivariateFinite{Multiclass{2}}(0=>0.9, 1=>0.1)

julia> x.prob_given_ref
OrderedCollections.LittleDict{UInt8, Float64, Vector{UInt8}, Vector{Float64}} with 2 entries:
  0x01 => 0.9
  0x02 => 0.1
``

@nilshg’s suggestion will work but is not recommended as this is not part of the public API.

Accessing the probabilities is described in the Working with Categorical Data section of the manual (see also this section of “Getting Started”). Here are some more examples:

julia> y = coerce(["c", "b", "a"], Multiclass)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "a"

julia> d = UnivariateFinite(["a", "c"], [0.1, 0.9], pool=y)
UnivariateFinite{Multiclass{3}}(a=>0.1, c=>0.9)

julia> pdf(d, "a")
0.1

julia> pdf(d, levels(y))
3-element Vector{Float64}:
 0.1
 0.0
 0.9

And for a vector of distributions:

julia> d_vector = UnivariateFinite(["a", "b"], [0.1 0.9; 0.4 0.6], pool=missing)
2-element MLJBase.UnivariateFiniteVector{Multiclass{2}, String, UInt8, Float64}:
 UnivariateFinite{Multiclass{2}}(a=>0.1, b=>0.9)
 UnivariateFinite{Multiclass{2}}(a=>0.4, b=>0.6)

julia> broadcast(pdf, d_vector, "a")
2-element Vector{Float64}:
 0.1
 0.4

julia> pdf(d_vector, ["a", "b"])
2×2 Matrix{Float64}:
 0.1  0.9
 0.4  0.6

julia> pdf(d_vector, ["b", "a"])
2×2 Matrix{Float64}:
 0.9  0.1
 0.6  0.4

In basic MLJ workflow you shouldn’t really need the probabilities in matrix form. For example, all probabilisitic measures in MLJ (eg, LogLoss()) expect distributions for first argument, not numerical probabilities or parameters:

julia> y = coerce(rand(["a", "b"], 10), OrderedFactor)
10-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "b"
 "b"
 "b"
 "a"
 "b"
 "a"
 "a"
 "a"

julia> yhat = UnivariateFinite(["a", "b"], rand(10), augment=true, pool=y)
10-element MLJBase.UnivariateFiniteVector{OrderedFactor{2}, String, UInt32, Float64}:
 UnivariateFinite{OrderedFactor{2}}(a=>0.863, b=>0.137)
 UnivariateFinite{OrderedFactor{2}}(a=>0.995, b=>0.00547)
 UnivariateFinite{OrderedFactor{2}}(a=>0.0523, b=>0.948)
 UnivariateFinite{OrderedFactor{2}}(a=>0.859, b=>0.141)
 UnivariateFinite{OrderedFactor{2}}(a=>0.216, b=>0.784)
 UnivariateFinite{OrderedFactor{2}}(a=>0.277, b=>0.723)
 UnivariateFinite{OrderedFactor{2}}(a=>0.985, b=>0.0148)
 UnivariateFinite{OrderedFactor{2}}(a=>0.206, b=>0.794)
 UnivariateFinite{OrderedFactor{2}}(a=>0.373, b=>0.627)
 UnivariateFinite{OrderedFactor{2}}(a=>0.553, b=>0.447)

julia> LogLoss()(yhat, y)
10-element Vector{Float64}:
 0.14702211036373602
 5.20855707692713
 0.053705325134450276
 1.9585781504187798
 0.24387484194022874
 1.2835655183125487
 4.210745321783671
 1.5783092501620064
 0.9856382706229556
 0.5921152165139617

Hope this helps!

2 Likes

@gideonsimpson P. S. Be good if you could add “mlj” to the tags, thanks.

Thanks for clarifying, I’ve added the tag.