MLJ.save() and restoring with machine() don't work

Hello!

I made a simple XGBoost model with the pipeline below, fitted it with data and tried to save it and then restore it, but when restoring it (even if in the same notebook) I get error on trying to apply predict. Any idea on how to solve it?

XGBC = @load XGBoostClassifier
xgb = XGBC()
ohe = OneHotEncoder()

# Pipeline OneHotEncoder > XGBoost
xgb_pipe = ohe |> xgb

# Setting Target and Features tables:
y, X = unpack(df, ==(:y_label), col->true)

train, test = partition(1:length(y), 0.1, shuffle=true)

xgbm = machine(xgb_pipe, X, y, cache=false)
fit!(xgbm, rows=train, verbosity=0)

MLJ.save("mach_xgb_pipe.jls", xgbm)

# Restoring the model and using for predictions:
mach_restored = machine("mach_xgb_pipe.jls")

yhat = predict_mode(mach_restored , X[test,:])

Error message:

Error: Failed to apply the operation `predict` to the machine machine(:xg_boost_classifier, …), which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
│ Model (xg_boost_classifier):
│ input_scitype = Unknown
│ target_scitype =Unknown
│ output_scitype =Unknown
│ 
│ Incoming data:
│ arg of predict	scitype
│ -------------------------------------------
│ Node @818 → :one_hot_encoder	Table{AbstractVector{Continuous}}
│ 
│ Learning network sources:
│ source	scitype
│ -------------------------------------------
│ Source @791	Table{Union{AbstractVector{Continuous}, AbstractVector{Multiclass{10}}, AbstractVector{Multiclass{2}}, AbstractVector{Multiclass{89}}, AbstractVector{Multiclass{6}}}}
│ Source @496	AbstractVector{OrderedFactor{2}}
└ @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\composition\learning_networks\nodes.jl:153
XGBoostError: (caller: XGBoosterPredictFromDMatrix)
[14:39:55] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.

Stacktrace:
  [1] _apply(y_plus::Tuple{Node{Machine{Symbol, true}}, Machine{Symbol, true}}, input::DataFrame; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\composition\learning_networks\nodes.jl:159
  [2] _apply
    @ C:\Users\User\.julia\packages\MLJBase\mIaqI\src\composition\learning_networks\nodes.jl:144 [inlined]
  [3] (::Node{Machine{Symbol, true}})(Xnew::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\composition\learning_networks\nodes.jl:140
  [4] output_and_report(signature::MLJBase.Signature{NamedTuple{(:predict, :transform), Tuple{Node{Machine{Symbol, true}}, Node{Machine{Symbol, true}}}}}, operation::Symbol, Xnew::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\composition\learning_networks\signatures.jl:374
  [5] predict(model::MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :xg_boost_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}, fitresult::MLJBase.Signature{NamedTuple{(:predict, :transform), Tuple{Node{Machine{Symbol, true}}, Node{Machine{Symbol, true}}}}}, Xnew::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\operations.jl:191
  [6] predict(mach::Machine{MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :xg_boost_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}, false}, Xraw::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\operations.jl:133
  [7] predict
    @ C:\Users\User\.julia\packages\MLJTuning\drqMP\src\tuned_models.jl:795 [inlined]
  [8] predict_mode(m::MLJTuning.ProbabilisticTunedModel{Grid, MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :xg_boost_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}}, fitresult::Machine{MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :xg_boost_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}, false}, Xnew::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\interface\model_api.jl:11
  [9] predict_mode(mach::Machine{MLJTuning.ProbabilisticTunedModel{Grid, MLJBase.ProbabilisticPipeline{NamedTuple{(:one_hot_encoder, :xg_boost_classifier), Tuple{Unsupervised, Probabilistic}}, MLJModelInterface.predict}}, false}, Xraw::DataFrame)
    @ MLJBase C:\Users\User\.julia\packages\MLJBase\mIaqI\src\operations.jl:133
 [10] top-level scope
    @ In[113]:5

Side note: the model does work when applied directly to “predict”/“predict_mode”.
I also tried, without success:

using JLSO

smach = serializable(mach_tuned_xgb_pipe)
JLSO.save("machine_serialized.jlso", :machine => smach)

loaded_mach = JLSO.load("machine_serialized.jlso")[:machine]
restore!(loaded_mach)

yhat = predict_mode(loaded_mach, X[test,:])

Thanks for helping!

1 Like

Thanks for reporting. This is issue has been reported elsewhere, and its good to have your example because I was previously unable to reproduce it. Here is the issue link: Serialized Composite Model Fails with XGBoost · Issue #927 · JuliaAI/MLJBase.jl · GitHub

Thanks a lot for the quick answer @ablaom !

I just provided a not-working code with an example csv file attached in the git issue (Serialized Composite Model Fails with XGBoost · Issue #927 · JuliaAI/MLJBase.jl · GitHub). Let me know if there is anything else that I can do (and also when/if it gets solved!)

Thanks!

Note a possible workaround is to use the gradient boosted trees provided by the pure Julia implementation EvoTrees.jl . These should serialise fine in pipelines.

@ablaom , do you think it is something you will be able to solve in the near future (both because of direction of efforts and/or priorities)? I’m pursuing an organizational shift towards Julia, and since xgboost is widely adopted here, I’ll need either a workaround or maybe even switching to using directly XGBoost.jl (even though I find MLJ usage more user-friendly, although I also can’t get the xgb model to show feature importances when using MLJ (but this probably to my lack of coding knowledge)).

Anyway, it was great to have such prompt answers from you! (thanks a lot for that!).

By the way, the easiest workaround I figured is to keep using MLJ, but break the model away from the pipeline (so not using xgb_pipe = ohe |> xgb , but instead treating the raw data with “ohe” in a machine, then another machine to input this treated data and running xgb. Then it’s a matter of saving both machines (ohe and xgb) and restoring them etc). This way is working. The problem seems to be on pipelines…

do you think it is something you will be able to solve in the near future

I’ll try to look at it this week.

although I also can’t get the xgb model to show feature importances when using MLJ

Definitely raise an issue if you are having trouble. It’s likely that feature importances of a supervised model within a pipeline are not directly accessible, and so adding an issue to request this makes sense.

Fix coming soon: Fix problem with serialization of nested models when component model overload `save`/`restore` by ablaom · Pull Request #960 · JuliaAI/MLJBase.jl · GitHub

This should be resolved by updating to latest versions (you need MLJBase 1.1.2). Feel free to re-open Serialized Composite Model Fails with XGBoost · Issue #927 · JuliaAI/MLJBase.jl · GitHub if some issue persists.

@Paulo_Refosco Would you mind editing the post title to make it more specific? The issue was limited to XGBoost.jl models.

Thanks @ablaom !

As I mentioned in Serialized Composite Model Fails with XGBoost · Issue #927 · JuliaAI/MLJBase.jl · GitHub, saving and restoring a Pipeline including XGBoost using MLJ.save() now is working for me!

For the title, I definitely don’t mind changing it, but, maybe due to my profile here (or maybe because I don’t know how to use it), it seems I am unable to change the titles of my topics… I’ll look further into it though.

I will consider raising the issue on features importances and also let me point out that I tried saving the tuned xgb pipe model and restoring it ended up in the same error as we were getting for the “regular” xgb pipe. But I don’t know if it would really be expected to work on the tuned machine, so just pointing out.

And again, thanks very much for all the efforts. It helped me a lot!

Above issue resolved by MLJTuning 0.8.2