I am running a MLJ pipeline on a quite large dataset and I want to be able to save the model once, and then use it in later sessions without having to retrain it. I have tried following the documentation regarding saving models and that works fine. However, I am quite puzzled at the fact that the size of the saved model changes depending on the size of the dataset.

A MWE below:

```
using DataFrames
using MLJ
function get_data(N_rows)
df = DataFrame(rand(N_rows, 3), :auto)
df.x4 = rand(["A", "B", "C"], N_rows)
df.y = rand([0, 1], N_rows)
df = coerce(df, :y => Binary, :x4 => Multiclass)
X = select(df, Not(:y))
y = df.y
return X, y
end
N_rows = 1000
X, y = get_data(N_rows);
LogReg = @load LogisticClassifier pkg = MLJLinearModels
pipe_logreg = @pipeline(OneHotEncoder, LogReg(penalty = :none, fit_intercept = false),)
mach = machine(pipe_logreg, X, y)
fit!(mach)
MLJ.save("pipeline_test_$N_rows.jlso", mach)
```

The size of the saved model is 227 KB when `N_rows = 1_000`

and 1.7 MB when `N_rows = 10_000`

, yet they both save the same number of parameters (3 numerical coefficients and 3 coefficients for the categories). Or, thatâ€™s at least what I expect, but why does the size really change so drastically with more data? I do not expect MLJ to save the input data as well as that would be quite unintuitive.

This is run with Julia v1.7.0, MLJ v0.16.11, MLJBase v0.18.26, and MLJLinearModels v0.5.7.

Cheers!