MLJ: accessing model fields from resampler

Pere · January 18, 2021, 5:32pm

I have a composite model that has a parameter (alpha) that generates a probability distribution and then the features and samples are weighted according to this distribution. I also want the samples to be taken according to the parameter, and do all of this with a self-tuning model.

I’ve implemented this as a learning network as in this example in the MLJ docs. The structure is:

X,y → transformers → regressor → invert transform

The composite model has a parameter alpha to generate the weights for the transform. For the self-tuning, I also have a ResamplingStrategy depending on the same alpha. I’ve tried embedding the resampler as a field in the model and pass it to TunedModel as:

#LS <: ResamplingStrategy

r = range(lkrr_model, :(LS.alpha), lower=1, upper=10, scale=:log);
self_tuning_regressor = TunedModel(model=lkrr_model,
                              tuning=Grid(resolution=5),
                              resampling=lkrr_model.LS,
                              repeats=2,
                              range=r,
                              measure=rms);
tuned_kregressor = machine(self_tuning_regressor,K,y)
MLJ.fit!(tuned_kregressor)

However, every time alpha is changed, this is not reflected on the resampler. I guess this is because the model tuning function evaluates a clone of the model with the changed parameter. I can alter the resampler’s parameters at each call of train_test_pairs and the values are saved just fine, but I want it to be linked to the model so that I can autotune.

I have also tried setting the probabilities as weights and passing them around with a machine. But still, when calling train_test_pairs(ls, rows, X, y, w), the X, y and w are the ones associated with the self-tuning machine (tuned_kregressor), prior to any transformation. If I could access the data after the transformation then it would be ok.

How do I auto tune the resampler, or simply access the model’s fields from the resampler?

CameronBieganek · January 18, 2021, 8:30pm

I’m not sure I understand the motivation for having the resampling depend on model parameters. Normally resampling depends only on the data, not the particular model.

I don’t know of a way to achieve what you’re trying to do with the MLJ interface. It sounds like you need a custom tuning strategy, rather than a custom resampling stragegy. You could take a look at the interface provided in MLJTuning.jl. However, after briefly looking though their README, it doesn’t appear to me that MLJTuning supports what you’re trying to do, either.

Without knowing more about your model, it seems like the kind of sample weighting that you want should be occurring in the fit method rather than in train_test_pairs.

Can you elaborate on why this approach does not work?

Pere · January 18, 2021, 9:38pm

I’m not sure I understand the motivation for having the resampling depend on model parameters. Normally resampling depends only on the data, not the particular model.

Take for instance active sampling algorithms, where the model is first fitted for an initial set and then the inputs with the highest uncertainty are chosen to be labelled next. Here, the choice of next set of samples depends on the fit result. To implement this in a resampler you would have to either access the fit result or fit the model again (hence needing access to the hyperparameters).

I don’t know of a way to achieve what you’re trying to do with the MLJ interface. It sounds like you need a custom tuning strategy, rather than a custom resampling stragegy. You could take a look at the interface provided in MLJTuning.jl. However, after briefly looking though their README, it doesn’t appear to me that MLJTuning supports what you’re trying to do, either.

I’d actually been looking at the MLJTuning code and I agree that I probably need a custom tuning strategy. Still, I’m not advanced enough yet for that so I was hoping there was an easier way. Alternatively, I won’t use MLJ’s autotuning and use something like Hyperopt.jl instead since this would allow me to tune anything.

Can you elaborate on why this approach does not work?

As in the network learner in the MLJ documentation, I have a transformation step and then a regressor which takes the transformed variables. Ideally, the transformer gives me the vector w that contains the sampling probabilities/weights. and I can do this

function MMI.fit(model::CompositeModel, verbosity, X, y,w=1)
    ys = source(y)
    Xs = source(X)
    yt = transform(model.lr,ys)
    Kt = transform(model.lr,Xs)
    ...
    yhat = inverse_transform(model.lr,zhat)
    ws = source(model.lr.weights)
    mach = machine(Deterministic(), Xs, ys, ws; predict=yhat)
    return!(mach, model, verbosity)
end

However, the autotuning function does not pass the transformed variables in mach to the resampler. It passes the X, y and w that are initially bound to the autotuning machine(self_tuning_regressor,X,y,w). The transformed variables are only used in training the model.

Hope I made myself clear, thanks for the help!

CameronBieganek · January 19, 2021, 12:01am

It seems like you’re trying to fit a square peg into a round hole.

The resampling strategy interface in MLJ is designed for conducting model performance evaluation on fully trained models. (Although the resampling strategies sort of perform double duty, since you also have to pick one to use for TunedModel.) If you’re trying to do some sort of active sampling algorithm to learn your model, then I think it might make more sense to put all of that into the fit function. Even if training your model is an iterative process, the entire training process should normally be encapsulated in the fit function, so that the output of the fit function is a fully trained model.

Hope that helps!

Pere · January 19, 2021, 9:37pm

I ended up doing the sample selection all inside the fit function and not using a resampler (I simply pass all samples to the fit function). Now I can automatically tune the sampling distribution parameter as I wanted.

I’m actually porting some old code to the MLJ framework to more easily compare with other models and to learn, hence why I’m trying to do everything by the book. Thanks for the help!

Topic		Replies	Views
How do I tune a pipeline in MLJ? Machine Learning optimization , mlj	1	177	March 4, 2024
Automatic Creation of a Grid of Tuning Parameters Machine Learning machine-learning , mlj , tuning	4	836	November 23, 2021
No model matching using MLJ, how should I prepare my data? Data mlj	3	320	February 22, 2023
Automate training MLJ models Machine Learning machine-learning , mlj	14	2112	February 17, 2020
In MLJ, can I access individual predictions in an `EnsembleModel`? Machine Learning	1	338	January 9, 2022

MLJ: accessing model fields from resampler

Related topics