Saving multiple MLJ machine to a single file?

xgdgsc · October 26, 2022, 5:41am

Is it possible to save/load multiple trained MLJ machine to a single file to ease file management? Using JLD2 seems to generate a huge file (40GB using JLD2 compared to 40MB when using MLJ.save).

ablaom · October 26, 2022, 7:55pm

Thanks for posting this question.

You can serialise however you want, so long as you appropriately preprocess (apply serializable) and post-process (apply restore!) as described here.

So, for bundling everything in one file, you can do something like this:

using MLJ

# should not be needed after https://github.com/alan-turing-institute/MLJ.jl/issues/975
import MLJBase: restore!, serializable


X, y = @load_iris

KNNClassifier = @load KNNClassifier pkg=NearestNeighborModels

machs = map(2:10) do K
    model = KNNClassifier(; K)
    mach = machine(model, X, y) |> fit!
end

serializable_machs = serializable.(machs)

using JLSO
JLSO.save("machines.jlso", :machines => serializable_machs)

loaded_machs = JLSO.load("machines.jlso")[:machines]
restore!.(loaded_machs)

julia> foreach(loaded_machs) do mach
       loss = round(log_loss(predict(mach, X), y) |> mean, sigdigits=3)
       println("K=$(mach.model.K) \t traing_loss=$loss")
       end
K=2      traing_loss=0.0277
K=3      traing_loss=0.0494
K=4      traing_loss=0.0604
K=5      traing_loss=0.0566
K=6      traing_loss=0.0644
K=7      traing_loss=0.074
K=8      traing_loss=0.072
K=9      traing_loss=0.0693
K=10     traing_loss=0.0729

Does this address your issue?

xgdgsc · October 27, 2022, 2:18am

Thanks. Do you think serializable should be added to MLJ cheatsheat?

ablaom · October 27, 2022, 10:45pm

Mmm. Not sure. I imagine the most common workflow for users is the simplified workflow MLJ.save("my_machine.jls", mach) ... machine("my_machine.jls") which is already in the cheatsheet.

Returning to your earlier comment

Using JLD2 seems to generate a huge file (40GB using JLD2 compared to 40MB when using MLJ.save).

I’m assuming this is because you did not use serializable, to remove training data (among other things). If not, then this needs investigation. (The JLS-only simplified workflow takes care of this automatically.)

xgdgsc · October 28, 2022, 7:18am

Is three a reliable way to compare (==) machines? I saved using JLSO and loaded to compare the machines before saving and loaded. all == operators seems to give false, while a look at fitted_params(mach) seems equal, and == on fitted_params also gives false.

ablaom · October 31, 2022, 1:25am

You cannot presently use == for machines to conclude that two machines give the same predictions (transformations, etc). Currently the model API makes no assumption about the meaning of fitresult1 == fitresult2 for the learned parameters fitresult output by fit(::Model, ...), so even if we overloaded == for machines, that probably won’t give you what you want in all cases. If you have strong use-case for introducing a stronger requirement in the API, feel free to raise an issue at MLJModelInterface.jl. But as it would be some work to ensure all model implementations comply with the stricter requirement, that could take some time.

Topic		Replies	Views
Saving models in MLJ - only final ones without data Machine Learning question , mlj	3	511	September 9, 2022
MLJ.save() and restoring with machine() don't work Machine Learning question , mlj	11	363	March 7, 2024
Save and Load Random Forest trained with MLJ/ScikitLearn.jl Machine Learning mlj	4	1525	March 1, 2021
What does MLJ.save really save? Machine Learning mlj	12	1323	February 23, 2022
TypeError: Expected UnionAll, got Type{Machine{...}} Machine Learning mlj	6	137	July 1, 2024

Saving multiple MLJ machine to a single file?

Related topics