Changing the size of the `target` using MLJ's Learning Networks

Hi. I have a Static transformer ( <: Static ) that changes the size of the input (X). It’s using sliding windows, so that size(X_tag)[1] = size(X)[1] - window_size.

I’m trying to figure out what’s the right way to change the target variable’s size (y) using the MLJ’s Learning Network APIs.

I have a concrete problem with evaluate!(...)-ing the composed model, due to size mismatch of target variable

Thanks @barakber for giving MLJ’s learning networks a spin.

MLJBase/MLJ have a method, selectrows, which is overloaded for nodes, which you can therefore use immediately:

julia> y = source(rand(3))
Source @577 ⏎ `AbstractVector{Continuous}`

julia> ysmall = selectrows(y, 1:2)
Node @844
  args:
    1:	Source @577
  formula:
    #117(
      Source @577)

julia> ysmall()
2-element Vector{Float64}:
 0.4237311687983826
 0.6169364885950386

It works for any vector, matrix or table. Or, to do this “by hand”:

julia> ysmall = node(yy->yy[1:2], y)
Node @036
  args:
    1:	Source @577
  formula:
    #1(
      Source @577)

julia> ysmall()
2-element Vector{Float64}:
 0.4237311687983826
 0.6169364885950386

The BalancedModel wrapper from MLJBalancing reduces the length of input data, and is implemented using learning networks. That might be another example to look at, although here the oversamplers/undersamplers work on both X and y simultaneously (but are also Static).

This Stacking tutorial may provide further clues (although that tute contains some deprecated syntax).

If this is not sufficiently helpful, I may need a little more context.

thanks! @ablaom for the quick response.
My problem is specifically with evaluate!-ing the composed model (for CV).

here is a simplified code example

module Example

using MLJBase
using RDatasets: dataset
using MLJXGBoostInterface

# ---
mutable struct ExampleTransformer <: Static
end

function MLJBase.transform(self::ExampleTransformer, _, X)
    selectrows(X, 1:size(X)[1] - 10)
end

# ---
mutable struct ExampleComposed <: ProbabilisticNetworkComposite
    transformer :: ExampleTransformer
    classifier  :: XGBoostClassifier
end

function MLJBase.prefit(self::ExampleComposed, verbosity, X, y)
    Xs = source(X)
    ys = source(y)

    mach1 = machine(:transformer)
    X1    = transform(mach1, Xs)

    # --------
    # here I change the size of the `target`
    y1 = node((y -> y[1:end - 10]), ys) # <---
    # --------

    mach2 = machine(:classifier, X1, y1)
    yhat  = predict(mach2, X1)
    return (;
        predict = yhat,
    )
end


function test()
    iris = dataset("datasets", "iris")
    y, X = unpack(iris, ==(:Species))
    y = categorical(map((x -> x == "setosa"), y))

    m = machine(
        ExampleComposed(
            ExampleTransformer(),
            XGBoostClassifier()
        ),
        X, y)

    m |> fit!

    # ==================================================
    # The error I get here -

    # ERROR: DimensionMismatch: Encountered two objects with sizes (15,) and (25,) which needed to match but don't.
    # ==================================================
    evaluate!(m, measure=auc)
end

end

the error that I get here is -

ERROR: DimensionMismatch: Encountered two objects with sizes (15,) and (25,) which needed to match but don't. 
Stacktrace:
  [1] check_dimensions
    @ ~/.julia/packages/MLJBase/ByFwA/src/utilities.jl:145 [inlined]
  [2] _check(measure::AreaUnderCurve, yhat::UnivariateFiniteVector{…}, y::CategoricalArrays.CategoricalVector{…})
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/measures/measures.jl:60
  [3] Measure
    @ ~/.julia/packages/MLJBase/ByFwA/src/measures/measures.jl:132 [inlined]
  [4] value
    @ ~/.julia/packages/MLJBase/ByFwA/src/measures/measures.jl:202 [inlined]
  [5] value
    @ ~/.julia/packages/MLJBase/ByFwA/src/measures/measures.jl:196 [inlined]
  [6] (::MLJBase.var"#326#332"{…})(m::AreaUnderCurve, op::Function)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1237
  [7] #4
    @ ./generator.jl:36 [inlined]
  [8] iterate
    @ ./generator.jl:47 [inlined]
  [9] collect(itr::Base.Generator{Base.Iterators.Zip{Tuple{…}}, Base.var"#4#5"{MLJBase.var"#326#332"{…}}})
    @ Base ./array.jl:834
 [10] map(::Function, ::Vector{AreaUnderCurve}, ::Vector{typeof(predict)})
    @ Base ./abstractarray.jl:3409
 [11] fit_and_extract_on_fold
    @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1230 [inlined]
 [12] (::MLJBase.var"#307#308"{MLJBase.var"#fit_and_extract_on_fold#330"{…}, Machine{…}, Int64})(k::Int64)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1056
 [13] _mapreduce(f::MLJBase.var"#307#308"{…}, op::typeof(vcat), ::IndexLinear, A::UnitRange{…})
    @ Base ./reduce.jl:440
 [14] _mapreduce_dim
    @ ./reducedim.jl:365 [inlined]
 [15] mapreduce
    @ ./reducedim.jl:357 [inlined]
 [16] _evaluate!(func::MLJBase.var"#fit_and_extract_on_fold#330"{…}, mach::Machine{…}, ::CPU1{…}, nfolds::Int64, verbosity::Int64)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1055
 [17] evaluate!(mach::Machine{…}, resampling::Vector{…}, weights::Nothing, class_weights::Nothing, rows::Nothing, verbosity::Int64, repeats::Int64, measures::Vector{…}, operations::Vector{…}, acceleration::CPU1{…}, force::Bool, logger::Nothing, user_resampling::CV)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1259
 [18] evaluate!(::Machine{…}, ::CV, ::Nothing, ::Nothing, ::Nothing, ::Int64, ::Int64, ::Vector{…}, ::Vector{…}, ::CPU1{…}, ::Bool, ::Nothing, ::CV)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1335
 [19] evaluate!(mach::Machine{…}; resampling::CV, measures::Nothing, measure::AreaUnderCurve, weights::Nothing, class_weights::Nothing, operations::Nothing, operation::Nothing, acceleration::CPU1{…}, rows::Nothing, repeats::Int64, force::Bool, check_measure::Bool, verbosity::Int64, logger::Nothing)
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1015
 [20] test()
    @ Main.Example /workspaces/blab/src/Example.jl:59
 [21] top-level scope
    @ REPL[71]:1

That’s a nicely formulated example that quickly explains your issue very well, thank you.

The problem is that your new model ExampleComposed violates a basic assumption for use with evaluate!: if yhat = predict(m, X) then yhat must have the same number of observations as X. This is unavoidable, for how else is the routine supposed to know which ground truth observations are to be paired with which predictions?

You may want to rethink your approach, although one remedy comes to mind. You can alter your model so that the “missing” predictions are padded with missing. Then, provided you use a measure that supports missing predictions, you can use evaluate!. Unfortunately, auc does not support missing predictions (feel free to open an issue) so I’ve substituted brier_score in the following adaption of your example:

Pkg.activate(temp=true)
Pkg.add(["MLJBase", "RDatasets", "MLJModels", "StatisticalMeasures"])
using MLJBase, MLJModels
using RDatasets: dataset
using StatisticalMeasures

# ---
mutable struct ExampleTransformer <: Static
end

function MLJBase.transform(self::ExampleTransformer, _, X)
    selectrows(X, 1:size(X)[1] - 10)
end

model = ExampleTransformer()
mach = machine(model)

# ---
mutable struct ExampleComposed <: ProbabilisticNetworkComposite
    transformer
    classifier
end
composite = ExampleComposed(ExampleTransformer(), ConstantClassifier())

function MLJBase.prefit(self::ExampleComposed, verbosity, X, y)
    Xs = source(X)
    ys = source(y)

    mach1 = machine(:transformer)
    X1    = transform(mach1, Xs)

    # --------
    # here I change the size of the `target`
    y1 = node((y -> y[1:end - 10]), ys) # <---
    # --------

    mach2 = machine(:classifier, X1, y1)
    yshort  = predict(mach2, X1)
    yhat = node(yshort) do y
        vcat(y, fill(missing, 10))
    end
    return (;
        predict = yhat,
    )
end

function test()
    iris = dataset("datasets", "iris");
    y, X = unpack(iris, ==(:Species));
    y = categorical(map((x -> x == "setosa"), y));

    m = machine(
        ExampleComposed(
            ExampleTransformer(),
            ConstantClassifier(),
        ),
        X, y)

    m |> fit!

    evaluate!(m, measure=brier_score)
end

Caveat. You should understand how StatisticalMeasures handles aggregation of missing values: the missing values are skipped, but they still contribute counts to normalisation. For example, the Brier scores above will be the sum of the non-missing observational scores, divided by 150, not 140.

thanks. so maybe I’m missing something about how evaluate! works in this situation.
In my example m |> fit! works. It’s using predict().

In my prefit I’m doing

    ...
    mach1 = machine(:transformer)
    X1    = transform(mach1, Xs)
    ...

which changes the number of observations (size(X) - 10), the same way I do for the target (y)

How is evaluate! different than fit! in this case?

I’m simply saying that you cannot evaluate any model (whether it comes from a learning network or whatever) if the assertion below can fail:

mach = machine(model, X, y) |> fit!
yhat = predict(mach, Xnew)
@assert nrows(yhat) == nrows(Xnew)

My prosal to pad output of yhat with missings, in the implementation of model=ExampleComposedModel(...), resolves this particular problem. There may be another solution, but that’s one.

I see, thanks.
So to me it sounds like these abstractions (machine, fit!, predict, evaluate! etc) are not well suited for changing the size (nrows) of X (and y).

My goal with Learning Networks, was to hide the transformer that changes nrows (it’s doing sliding windows), “under the hood”, and create a new model type that could be fit!ed, predicted, evaluate!ed like any other model

In any case, thanks for your help!