I believe the Machines
section of the manual addresses at least some of this. But let me
re-iterate to hopefully sort out possible points of confusion. I will
return to the poster’s specific questions after this more
general explanation. Suggestion to improve the documentation most
In a normal MLJ workflow one constructs a machine with
mach = machine(model, X, y) and the data
y should not be
mutated. However, mutation of
model (whose fields are the model
“hyper-parameters”) is allowed. The outcome of
depends only on
model ( i.e., on the values of those
hyperparameters). The call
fit!(mach) is not allowed to mutate
model, so if a user does not change
model, then the outcome
(learned parameters) of a second
fit! call is the same as the
first. (Indeed, if
model is unchanged, the lower level
MLJModelInterface.fit function is never even called.)
If your model is iterative, with an iteration parameter called
epochs, say, then increasing the number of epochs in the model means
fit! call has a different outcome:
Clf = @load NeuralNetworkClassifier
model = Clf(epochs=10)
mach = machine(model, X, y)
fit!(mach) # train for 10 epochs
model.epochs = 12
fit!(mach) # train for 12 epochs total
However, whether training begins from scratch (“cold restart”) or
simply adds the necessary number of iterations to a partly trained
model (“warm restart”) depends on the implementation. Most packages
implementing MLJ’s model interface for an iterative model implement
MLJModelInterface.update in addition to
MLJModelInterface.fit, as a way of buying into “warm
restart”. (However, in principle, other kinds of hyper-parameter
changes may trigger a warm restart as well.)
Another way to change the outcome of
fit! is to provide the
rows=..., which specifies what “view” of the data to train on,
fit!(mach, rows=1:100). Whenever the view changes, a
call will detect this an make sure to retrain the model.
There is other stuff going on behind the scenes, such as data caching,
but that’s another story.
Returning to the specific questions in the post:
when we call fit!() several times (it says “machine has been trained x times”, but does it try to improve the previous fitting (like a “gradient descent”)?
No. Not unless you specifically increase the iteration parameter.
Can we overfit a model by calling fit!() many times?
No, not unless you are also changing some hyperparameter (eg, regularisation parameter or iteration parameter)
Is there an interest to call fit!() several times on different data (like when we are performing a k-fold cross validation)?
Yes, you call fit on different views of the data using the keywork
rows=... as explained above. Ordinarily however, there is boiler
plate code for this kind of thing. See the Evaluating Model
Performance section of the manual.
A simplified version of the
fit! code is presented in the
section of the manual. The
fit! method calls
outside of learning networks is essentially the same thing. The
detailed logic (when call
MLJModelInterface.fit and when to call
MLJModelInterface.update and when to skip training altogether) is
called in is detailed in the
fit_only doc-string, which I quote at
I hope this helps.
MLJBase.fit_only!(mach::Machine; rows=nothing, verbosity=1, force=false)
Without mutating any other machine on which it may depend, perform one of
the following actions to the machine
mach, using the data and model
bound to it, and restricting the data to
rows if specified:
Ab initio training. Ignoring any previous learned parameters and
cache, compute and store new learned parameters. Increment
Training update. Making use of previous learned parameters and/or
cache, replace or mutate existing learned parameters. The effect is
the same (or nearly the same) as in ab initio training, but may be
faster or use less memory, assuming the model supports an update
No-operation. Leave existing learned parameters untouched. Do not
Training action logic
For the action to be a no-operation, either
mach.frozen == true or
or none of the following apply:
mach has never been trained (
mach.state == 0).
force == true.
state of some other machine on which
mach depends has
changed since the last time
mach was trained (ie, the last time
mach.state was last incremented).
(iv) The specified
rows have changed since the last retraining and
mach.model does not have
mach.model has changed since the last retraining.
In any of the cases (i) - (iv),
mach is trained ab initio. If only
(v) fails, then a training update is applied.
To freeze or unfreeze
The data to which a machine is bound is stored in
args is either a
Node object, or, in the case that
concrete data was bound to the machine, it is concrete data wrapped in
Source node. In all cases, to obtain concrete data for actual
training, each argument
N is called, as in
MLJBase.fit (ab initio training) or
(training update) is dispatched on
mach.model and this data. See the
“Adding models for general use” section of the MLJ documentation for
more on these lower-level training methods.