Again, thanks to all for your comments.
Maybe it is intended that this interface is not the interface that the end user will
see. Maybe it is intended that a higher-level ML API will be built on top of this
interface.
Yes, @CameronBieganek, this is indeed the intended purpose.
There are three key stake holders in the design of a low level machine learning API:
-
developer, which here means someone adding model-generic functionality, such as
hyperparameter optimization, model search, iterative model control, etc.
-
implementer, someone implementing the interface for some existing model, because
they want to buy into the functionality made available by some developer. The existing
model will have a “native” interface likely designed independently of the low level API
-
user, someone who interacts directly with a model, and not through some high level
interface.
The existing design prioritizes the concerns of the developer and the implementer. The
result, I completely agree, is an API falling short of the requirements for direct user
interaction. The suggested API of @CameronBieganek is undoubtedly superior for this. But
from the point of view of the implementer/developer it is suboptimal because it requires
new methods (a method to extract learned parameters from the
options-fitted-parameter-report conglomerate, possibly another to extract the report) in
addition to extra gymnastics to ensure predict
dispatches on the conglomerate and the
stripped down learned parameters.
The idea of making fit!
mutating has been objected to by @juliohm, and I would also
prefer to avoid this.
It seems to me that adoption of the API depends critically on implementation being as
simple as possible (a minimum of compulsory methods) and as unobtrusive as possible (a
minimum of new structs, no abstract types to subtype). For comparison, Tables.jl is a
low-level API with wide adoption but it is not particularly user-friendly (think of
extracting column names).
But that doesn’t feel right to me, because LearnAPI is claiming some very fundamental
names like fit and predict.
Perhaps it was not clear, but these names are not exported by LearnAPI and not intended to
be exported by any package, or overloaded, except by model implementations.
@CameronBieganek, @jeremiedb I wonder, if the API becomes more palatable if LearnAPI.jl is
bundled with a lightweight “user interface”, UserAPI
??
I’m not a fan of passing around nothing in 90% of cases because you don’t need one of
the inputs or outputs that the API specifies, e.g. state = nothing for a model fit or
report = nothing for a model prediction.
I agree that a method returning nothing
in most cases feels clunky, but it avoids
complicating case distinctions for the developer. One of the key pain points in MLJ
development was how to accommodate a model like DBSCAN clustering, which does not
generalize to new data (has no learned parameters) but nevertheless has output, separate
from the transformation itself (the labels), that you would like exposed to the user. To
that end, we introduced the possibility that transform
(or predict
) can output a
report
item, but to make this non-breaking we introduced a trait to flag the fact that
transform
’s output was in two parts. In the end we wound up breaking developer code we
didn’t know about, and there was understandable
objection to
such complicated behavior in a low-level method.
I would much prefer verbosity to be a keyword argument, rather than a positional argument that occurs before X and y.
I suppose if we extend the notion of “metadata” to include verbosity
, this would address
your concern. Conceptually this feels a bit of a stretch. We’d have to worry about
verbosity
in every implementation of the optional data interface, which could be
annoying. Again, this feels like something we’re only doing for user-friendliness, but
I’ll consider it further.
“A model is just a container for hyper-parameters.”
This seems like a misnomer to me.
The use of the word “model” in LearnAPI.jl for the hyperparameters struct coincides with
it’s use in MLJ and its documentation. Objections to this use have been raised a few
times. I’d be happy to change it here; it’s probably too late for MLJ. I’d prefer a name
for the hyperparameters struct that is not pluralized, like “hyperparameters” or
“options”.
How about “strategy”?
I think Flux definition of a Model fits well the bill here.
@jeremiedb I disagree. The conflation of hyperparameters (learning strategy) and learned
parameters (weights) in a Flux model, while elegant, is not universally satisfactory, as I
think the existence of Lux.jl establishes.
I’d also stress the importance for the API to allow for performance first options where
one needs it. For example, not force the computation of features_importance if not
needed.
Good point. Generally in MLJ models, a hyperparameter is introduced to control whether
some non-essential output is carried out, if that is likely to incur a performance
penalty. If the the user buys out, then the accessor function (e.g., feature_importance
)
could return nothing
. How does that sound, @jeremiedb? Perhaps you have different
suggestion.
Perhaps more critical is for iterative models such as GBT to be able to efficiently
track metric on a eval dataset. Under current MLJ design, such metric must be compute
from scratch each time the evaluation is desired, which can significantly alter
performance since inference from the full N trees must be computed each time instead of
just the residual ones since last eval. I’m not clear how best to support such tracking,
perhaps optional deval / x_eval / y_eval kwargs to fit could do it?
Here @jeremiedb is referring to the kind of external control of iterative models
implemented by MLJ’s
IteratedModel
,
using out-of-sample estimates of model performance for early stopping, for example.
This interesting use case sounds specific to ensemble models, but I think we can handle it
using the proposed API if we add one accessor function. First, we regard the evaluation
data as “metadata” (because it is not itself going to be sub-sampled, so is not “data” in
the LearnAPI.jl sense) and so is specified in fit
, as suggested, using keyword
arguments. This provides an interface point for the evaluation data. But the external
controller also needs access to the internally computed predictions on the evaluation set,
which we provide by adding an the (optional) LearnAPI accessor function
out_of_sample_predictions(model, state, report)
. We arrange fit
to record the
individual atomic model predictions in state
and our new accessor function returns the
complete ensemble prediction (or nothing
if evaluation data has not been provided). In
the event out_of_sample_predictions
is not implemented (is not flagged in the
LearnAPI.functions trait
) or it returns nothing
, the external controller computes the
out-of-sample predictions externally “from scratch”.
How does that sound, @jerimiedb?
Maybe related to the above, but I’m not sure to grasp well the difference in scope of
update! and ingest! mentioned in LearnAPI. I’m wondering if a single fit! could be both
sufficient and lighten the API verbs.
Conceptually these strike me as different, so separate verbs are appropriate, no?
One thing that I really dislike about the current state of affairs is the inconsistence
with output types. Models which are probabilistic may output distributions, categories,
integers, etc. and that is a pain to post-process. Moreover, developers of models are
lost with so much flexibility and end up choosing whatever they feel is more natural. As
a result, end-users struggle to write generic scripts that accept any kind of model.
@juliohm I agree, but I’m suggesting that the responsibility for nailing down the allowed
representation should be a higher level interface (such as MLJ,
which indeed tries to do this). For example, such an interface could require that if the
predict_proxy_type
is LearnAPI.Distribution()
then the output of predict
must
support the pdf
method from Distributions
.
Part of the problem is that agreement about “best representations” of data is still a bit
fluid in Julia. So I’m reluctant to lock this in at this low-level. What is provided are
traits to articulate what the model output actually looks like, either in terms of
scientific types, ordinary types, or individual observation types/scitypes.
What do others think about this?
I think we should try to reduce the noise with output types as much as possible, and
make sure that API functions like predict have a well-known (and fixed) output
scientific type. We can then add extra API functions like predictprob for models that
support variations of the output.
I’m not sure I properly understand this part. Are you suggesting that:
-
Every model that computes a proxy for the target (such as a probability
distribution, confidence interval, survival probability, etc) should be required to also
compute actual target values; and
-
predict
should be reserved for actual target predictions and not the target proxy?
The problem I have here is that computing actual target values may require secondary
computations, and new input from the user. For example, in probabilistic programming, it
is common to return a “sampleable” object representing a probabilistic target, in lieu of a
concrete target prediction. To get a point value requires sampling the object. Also, we
need to decide, in any kind of probabilistic predictor, whether we want the mode, median,
or mean; or maybe we should apply a probability threshold, to be learned using evaluation
data; how may random samples do we take from our sampleable object? and so forth.
One limitation of the current proposal, which may be related to semantic concerns you have (I’m guessing here) is that a
model can only predict
one kind of target proxy. And this seems reasonable, since most
models have a single proxy type as the object of their computation; everything else is
generally post-processing. However, if it would create less cognitive dissonance, we could
make the target proxy type an argument of predict
, with a model having the option to
implement more than one:
LearnAPI.predict(model::MyModel, fitted_params, Xnew, ::LearnAPI.TrueTarget) -> actual target
LearnAPI.predict(model::MyModel, fitted_params, Xnew, ::LearnAPI.Distribution) -> probabilistic prediction
Or, as in python, we could have a plethora of dedicated operations,
predict_distribution
, predict_survival_probability
, etc - one for each of the 16
different proxy types already identified
here and
growing. Would others prefer this?
Also, please avoid macros if they are still present in LearnAPI.jl. I didn’t find them
in the docs, but would like to just point this out before it is too late.
@juliohm There is indeed a convenience macro, @trait
(the only exported name) which provides
a shorthand for declaring traits. There’s an example
here;
the code is
here.
It seems innocuous enough to me. What do others think?
How wide is the scope of this API? Is it primarily statistical models or would it also
include other types like symbolic regression and so forth?
@DoktorMike, you can get a rough idea of the intended scope from this
list (will
likely be extended). I think symbolic regression would be fine.
In the search for something lighter and simpler than
MLJ I’ve found Invenia’s Models.jl 14 quite useful, which nicely separates untrained
models (templates) from trained models. Some of the ideas there may be helpful in this
endeavour?
Models.jl is nice but it appears to require that models subtype a Models.jl abstract type
and we are trying to avoid that. Also, it provides only a single “operation” predict
,
while I have found it useful and natural to have a transform
, inverse_transform
methods as well. This is one of features of sk-learn that I like.
Also, a project that I’m involved in has models being written to disk/database after
training, then read into memory in several separate processes (in parallel) to be used
(predicted) in long-running simulations. That is, each model is trained in 1 process and
subsequently used in several parallel processes. Ideally the serialization format would
use something not specific to Julia, such as JSON or something similar. You’ve probably
got this use case covered, but worth mentioning anyway.
@jocklawrie Mmmm. My feeling is that responsibility for serialization should live at a
higher level. What is missing, but planned, is a model-specific method to convert the
fitted_params
to a “serializable” form, by which I mean a form that is persistent
(not, for example, a C pointer) and anonymized. For most models fitted_params
is already serializable, but this is not universally the case. And then there would be
method to restore a deserialized object to a form needed by predict
, etc.
It [the LearnAPI.jl documenation] also starts with a statement that I disagree: “Machine
learning algorithms, also called models , have a complicated taxonomy. Grouping models,
or modeling tasks, into a relatively small number of types, such as “classifier” and
“clusterer”, and attempting to impose uniform behavior within each group, is
challenging.” Machine learning algorithms are not the same concept as machine learning
models. A learning algorithm is used to learn parameters of a learning model
(e.g. maximum likelihood estimation can be used learn coefficients of a linear model).
I’m happy to stand corrected on the distinction between models and algorithms. But
otherwise I stand by the opening statement. This is the central point really.
These models fit in well-known categories with well-defined behavior
@juliohm This may be so, but the number of such categories is very large. For example,
not all clusterers are the same. Some generalize to new data (and will implement fit
)
but some don’t; most compute ordinary labels (predict_proxy
will have the value
LabelAmbiguous()
) but some predict “soft” (probabilistic) labels (predict_proxy
will
be LabelAmbiguousDistribution()
). It may ultimately be useful to define Clusterer as
a LearnAPI model with behavior varying within such-and-such bounds (articulated via
LearnAPI traits) but I don’t think this should happen in LearnAPI itself.
in the case of supervised models, are there other required API functions other than fit and predict?
Yes, since a supervised model has the concept of a target variable, and predict
is
outputting the target or target proxy, you should make a predict_proxy
trait declaration
(see here and here) and a position_of_target
declaration (see here).
But that’s it. Finally, you must declare which methods you have explicitly overloaded (the functions
trait). Optional traits include promises about the scitype of training data
(target_fit_scitype
trait) or whether per-observation weights are supported
(position_of_weights
trait).