How do you train a machine on several datasets

hpaldan · August 18, 2022, 8:35pm

Hello!
I’m new to the MLJ package but I gather that a fundamental aspect is that when you create a machine object you also define the data for the object. I’m in a situation where I have several very large dataframes that I would like to use to train a model and then I have separate datasets that I want to transform with the machine object to analyze them (It is about using unsupervised learning for anomaly detection).

So, can you add more training data to a machine object after it has been created and fit to the new data as well and it is possible to transform new data after fitting it?

ablaom · August 21, 2022, 9:24pm

Thanks @hpaldan for your query and for giving MLJ a try.

To paraphrase your questions as I understand them:

Does MLJ support incremental learning (updating learned parameters based new data)?
Can a machine bound to an unsupervised model, trained on data X, be used to transform new data Xnew?

Do I understand correctly?

The answer to 1. is currently no. You can add iterations to a model bound to an iterative model (eg, EvoTreesClassifier) but not new data.

The answer to 2. is yes and there are many examples around in the MLJ learning resources. Here’s another:

using MLJ

PCA = @iload PCA pkg=MultivariateStats

X, y = @load_iris # a table and a vector

model = PCA(maxoutdim=2)
mach = machine(model, X) |> fit!

Xnew = (sepal_length = [6.4, 7.2, 7.4],
        sepal_width = [2.8, 3.0, 2.8],
        petal_length = [5.6, 5.8, 6.1],
        petal_width = [2.1, 1.6, 1.9],)

# training data transformed:
transform(mach, X)

# new data transformed:
transform(mach, Xnew)

If you are transitioning from another ML platform (eg, sk-learn or R) you may find this useful: MLJ for Data Scientists in Two Hours

hpaldan · August 22, 2022, 6:18am

Yes! Thats exactly my questions thank you! Somehow I must have missed the examples with tranforming new data,I only saw examples where partition is used. I will look through the tutorial, thanks again!

Topic		Replies	Views
Get training data from saved machine Machine Learning question , machine-learning , mlj	1	338	August 11, 2021
In MLJ, what does `fit!` do exactly? Machine Learning	1	1251	May 7, 2021
Training a MLJ model on a large dataset Machine Learning	2	775	December 19, 2021
[ANN] MLJ: Outlier Detection, Text Analysis, Improved Pipelines and Serialization Package Announcements machine-learning , mlj , outlier-detection , pipelines , text-analysis	0	785	April 11, 2022
How do I can I selectively inspect and use learned parameters in an MLJ pipeline? Machine Learning mlj , pipelines	1	334	June 20, 2022

How do you train a machine on several datasets

Related topics