Again, thanks to all for your comments.
Maybe it is intended that this interface is not the interface that the end user will
see. Maybe it is intended that a higher-level ML API will be built on top of this
Yes, @CameronBieganek, this is indeed the intended purpose.
There are three key stake holders in the design of a low level machine learning API:
developer, which here means someone adding model-generic functionality, such as
hyperparameter optimization, model search, iterative model control, etc.
implementer, someone implementing the interface for some existing model, because
they want to buy into the functionality made available by some developer. The existing
model will have a “native” interface likely designed independently of the low level API
user, someone who interacts directly with a model, and not through some high level
The existing design prioritizes the concerns of the developer and the implementer. The
result, I completely agree, is an API falling short of the requirements for direct user
interaction. The suggested API of @CameronBieganek is undoubtedly superior for this. But
from the point of view of the implementer/developer it is suboptimal because it requires
new methods (a method to extract learned parameters from the
options-fitted-parameter-report conglomerate, possibly another to extract the report) in
addition to extra gymnastics to ensure
predict dispatches on the conglomerate and the
stripped down learned parameters.
The idea of making
fit! mutating has been objected to by @juliohm, and I would also
prefer to avoid this.
It seems to me that adoption of the API depends critically on implementation being as
simple as possible (a minimum of compulsory methods) and as unobtrusive as possible (a
minimum of new structs, no abstract types to subtype). For comparison, Tables.jl is a
low-level API with wide adoption but it is not particularly user-friendly (think of
extracting column names).
But that doesn’t feel right to me, because LearnAPI is claiming some very fundamental
names like fit and predict.
Perhaps it was not clear, but these names are not exported by LearnAPI and not intended to
be exported by any package, or overloaded, except by model implementations.
@CameronBieganek, @jeremiedb I wonder, if the API becomes more palatable if LearnAPI.jl is
bundled with a lightweight “user interface”,
I’m not a fan of passing around nothing in 90% of cases because you don’t need one of
the inputs or outputs that the API specifies, e.g. state = nothing for a model fit or
report = nothing for a model prediction.
I agree that a method returning
nothing in most cases feels clunky, but it avoids
complicating case distinctions for the developer. One of the key pain points in MLJ
development was how to accommodate a model like DBSCAN clustering, which does not
generalize to new data (has no learned parameters) but nevertheless has output, separate
from the transformation itself (the labels), that you would like exposed to the user. To
that end, we introduced the possibility that
predict) can output a
report item, but to make this non-breaking we introduced a trait to flag the fact that
transform’s output was in two parts. In the end we wound up breaking developer code we
didn’t know about, and there was understandable
such complicated behavior in a low-level method.
I would much prefer verbosity to be a keyword argument, rather than a positional argument that occurs before X and y.
I suppose if we extend the notion of “metadata” to include
verbosity, this would address
your concern. Conceptually this feels a bit of a stretch. We’d have to worry about
verbosity in every implementation of the optional data interface, which could be
annoying. Again, this feels like something we’re only doing for user-friendliness, but
I’ll consider it further.
“A model is just a container for hyper-parameters.”
This seems like a misnomer to me.
The use of the word “model” in LearnAPI.jl for the hyperparameters struct coincides with
it’s use in MLJ and its documentation. Objections to this use have been raised a few
times. I’d be happy to change it here; it’s probably too late for MLJ. I’d prefer a name
for the hyperparameters struct that is not pluralized, like “hyperparameters” or
How about “strategy”?
I think Flux definition of a Model fits well the bill here.
@jeremiedb I disagree. The conflation of hyperparameters (learning strategy) and learned
parameters (weights) in a Flux model, while elegant, is not universally satisfactory, as I
think the existence of Lux.jl establishes.
I’d also stress the importance for the API to allow for performance first options where
one needs it. For example, not force the computation of features_importance if not
Good point. Generally in MLJ models, a hyperparameter is introduced to control whether
some non-essential output is carried out, if that is likely to incur a performance
penalty. If the the user buys out, then the accessor function (e.g.,
nothing. How does that sound, @jeremiedb? Perhaps you have different
Perhaps more critical is for iterative models such as GBT to be able to efficiently
track metric on a eval dataset. Under current MLJ design, such metric must be compute
from scratch each time the evaluation is desired, which can significantly alter
performance since inference from the full N trees must be computed each time instead of
just the residual ones since last eval. I’m not clear how best to support such tracking,
perhaps optional deval / x_eval / y_eval kwargs to fit could do it?
Here @jeremiedb is referring to the kind of external control of iterative models
implemented by MLJ’s
using out-of-sample estimates of model performance for early stopping, for example.
This interesting use case sounds specific to ensemble models, but I think we can handle it
using the proposed API if we add one accessor function. First, we regard the evaluation
data as “metadata” (because it is not itself going to be sub-sampled, so is not “data” in
the LearnAPI.jl sense) and so is specified in
fit, as suggested, using keyword
arguments. This provides an interface point for the evaluation data. But the external
controller also needs access to the internally computed predictions on the evaluation set,
which we provide by adding an the (optional) LearnAPI accessor function
out_of_sample_predictions(model, state, report). We arrange
fit to record the
individual atomic model predictions in
state and our new accessor function returns the
complete ensemble prediction (or
nothing if evaluation data has not been provided). In
out_of_sample_predictions is not implemented (is not flagged in the
LearnAPI.functions trait) or it returns
nothing, the external controller computes the
out-of-sample predictions externally “from scratch”.
How does that sound, @jerimiedb?
Maybe related to the above, but I’m not sure to grasp well the difference in scope of
update! and ingest! mentioned in LearnAPI. I’m wondering if a single fit! could be both
sufficient and lighten the API verbs.
Conceptually these strike me as different, so separate verbs are appropriate, no?
One thing that I really dislike about the current state of affairs is the inconsistence
with output types. Models which are probabilistic may output distributions, categories,
integers, etc. and that is a pain to post-process. Moreover, developers of models are
lost with so much flexibility and end up choosing whatever they feel is more natural. As
a result, end-users struggle to write generic scripts that accept any kind of model.
@juliohm I agree, but I’m suggesting that the responsibility for nailing down the allowed
representation should be a higher level interface (such as MLJ,
which indeed tries to do this). For example, such an interface could require that if the
LearnAPI.Distribution() then the output of
pdf method from
Part of the problem is that agreement about “best representations” of data is still a bit
fluid in Julia. So I’m reluctant to lock this in at this low-level. What is provided are
traits to articulate what the model output actually looks like, either in terms of
scientific types, ordinary types, or individual observation types/scitypes.
What do others think about this?
I think we should try to reduce the noise with output types as much as possible, and
make sure that API functions like predict have a well-known (and fixed) output
scientific type. We can then add extra API functions like predictprob for models that
support variations of the output.
I’m not sure I properly understand this part. Are you suggesting that:
Every model that computes a proxy for the target (such as a probability
distribution, confidence interval, survival probability, etc) should be required to also
compute actual target values; and
predict should be reserved for actual target predictions and not the target proxy?
The problem I have here is that computing actual target values may require secondary
computations, and new input from the user. For example, in probabilistic programming, it
is common to return a “sampleable” object representing a probabilistic target, in lieu of a
concrete target prediction. To get a point value requires sampling the object. Also, we
need to decide, in any kind of probabilistic predictor, whether we want the mode, median,
or mean; or maybe we should apply a probability threshold, to be learned using evaluation
data; how may random samples do we take from our sampleable object? and so forth.
One limitation of the current proposal, which may be related to semantic concerns you have (I’m guessing here) is that a
model can only
predict one kind of target proxy. And this seems reasonable, since most
models have a single proxy type as the object of their computation; everything else is
generally post-processing. However, if it would create less cognitive dissonance, we could
make the target proxy type an argument of
predict, with a model having the option to
implement more than one:
LearnAPI.predict(model::MyModel, fitted_params, Xnew, ::LearnAPI.TrueTarget) -> actual target
LearnAPI.predict(model::MyModel, fitted_params, Xnew, ::LearnAPI.Distribution) -> probabilistic prediction
Or, as in python, we could have a plethora of dedicated operations,
predict_survival_probability, etc - one for each of the 16
different proxy types already identified
growing. Would others prefer this?
Also, please avoid macros if they are still present in LearnAPI.jl. I didn’t find them
in the docs, but would like to just point this out before it is too late.
@juliohm There is indeed a convenience macro,
@trait (the only exported name) which provides
a shorthand for declaring traits. There’s an example
the code is
It seems innocuous enough to me. What do others think?
How wide is the scope of this API? Is it primarily statistical models or would it also
include other types like symbolic regression and so forth?
@DoktorMike, you can get a rough idea of the intended scope from this
likely be extended). I think symbolic regression would be fine.
In the search for something lighter and simpler than
MLJ I’ve found Invenia’s Models.jl 14 quite useful, which nicely separates untrained
models (templates) from trained models. Some of the ideas there may be helpful in this
Models.jl is nice but it appears to require that models subtype a Models.jl abstract type
and we are trying to avoid that. Also, it provides only a single “operation”
while I have found it useful and natural to have a
methods as well. This is one of features of sk-learn that I like.
Also, a project that I’m involved in has models being written to disk/database after
training, then read into memory in several separate processes (in parallel) to be used
(predicted) in long-running simulations. That is, each model is trained in 1 process and
subsequently used in several parallel processes. Ideally the serialization format would
use something not specific to Julia, such as JSON or something similar. You’ve probably
got this use case covered, but worth mentioning anyway.
@jocklawrie Mmmm. My feeling is that responsibility for serialization should live at a
higher level. What is missing, but planned, is a model-specific method to convert the
fitted_params to a “serializable” form, by which I mean a form that is persistent
(not, for example, a C pointer) and anonymized. For most models
is already serializable, but this is not universally the case. And then there would be
method to restore a deserialized object to a form needed by
It [the LearnAPI.jl documenation] also starts with a statement that I disagree: “Machine
learning algorithms, also called models , have a complicated taxonomy. Grouping models,
or modeling tasks, into a relatively small number of types, such as “classifier” and
“clusterer”, and attempting to impose uniform behavior within each group, is
challenging.” Machine learning algorithms are not the same concept as machine learning
models. A learning algorithm is used to learn parameters of a learning model
(e.g. maximum likelihood estimation can be used learn coefficients of a linear model).
I’m happy to stand corrected on the distinction between models and algorithms. But
otherwise I stand by the opening statement. This is the central point really.
These models fit in well-known categories with well-defined behavior
@juliohm This may be so, but the number of such categories is very large. For example,
not all clusterers are the same. Some generalize to new data (and will implement
but some don’t; most compute ordinary labels (
predict_proxy will have the value
LabelAmbiguous()) but some predict “soft” (probabilistic) labels (
LabelAmbiguousDistribution()). It may ultimately be useful to define Clusterer as
a LearnAPI model with behavior varying within such-and-such bounds (articulated via
LearnAPI traits) but I don’t think this should happen in LearnAPI itself.
in the case of supervised models, are there other required API functions other than fit and predict?
Yes, since a supervised model has the concept of a target variable, and
outputting the target or target proxy, you should make a
predict_proxy trait declaration
(see here and here) and a
position_of_target declaration (see here).
But that’s it. Finally, you must declare which methods you have explicitly overloaded (the
functions trait). Optional traits include promises about the scitype of training data
target_fit_scitype trait) or whether per-observation weights are supported