[ANN] LearnAPI.jl - Proposal for a basement-level machine learning API

Some ideas unrelated to the naming question:

  1. I’m not so happy with how verbosity is treated in MLJ and in the LearnAPI proposal. i.e. I prefer the keyword approach of MLJ, but I would like to have a simple way to change the default verbosity level. For example a function LearnAPI.default_verbosity(level), and keep the signature fit(alg, X, y; verbosity) to overrule the default.
  2. I wonder, if it would make sense to define a FitResult structure (could also be called Model) with fields algorithm, params, state, report. This would allow for clean and simple signatures for predict, update!, ingest!, transform, for example
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
model = fit(alg, X, y)
y = predict(LiteralTarget, model, Xtest)
model.alg.epochs = 10
update!(model, X, y)

alg = PCA()
model = fit(alg, X)
scores = transform(model, X2)
1 Like

My ranking: Learner* (because of the package name) > Strategy > Algorithm

(* two more cents: I don’t quite see the point about generalisation, I’d imagine the point is they learn from data. Isn’t that part of the reasoning of the package name? )

1 Like

@CameronBieganek, @wc4wc4wc4, @DoktorMike, @tecosaur, @adienes: I am surprised by the popularity of Learner and Strategy. When I read about machine learning (say on wikipedia) I see the terms “machine learning algorithm” or “machine learning method” quite often. I mean, what is the common thing about PCA, DBSCAN, Neural Network Classifiers, Lasso, Support Vector Machines, kNN, KMeans, Q-Learning, etc.? They are machine learning algorithms (or machine learning methods). Yes, each algorithm has some knobs to tune, but for me a k-nearest-neighbor classifier with k = 3 or a Lasso with \lambda = 10^{-3} is still a machine learning algorithm; actually, without choosing the hyperparameters the precise computational procedure would not be fully specified, such that kNN without specification of k could be seen as a “machine learning approach” rather than a “machine learning algorithm”.
Therefore, the more I think about it, the more I like Algorithm for the “struct whose name represents a machine learning algorithm (such as decision tree classifier), and whose fields represent the hyperparameters” (as specified in the gh issue). Or maybe MLMethod? But Algorithm feels better to me.
What is it that makes you prefer Strategy or Learner?

3 Likes

Algorithm is much too abstract IMO. An algorithm is just a finite sequence of specific instructions, and when I hear “Algorithm (computer science)” the first things that come to mind to me are:

  • Djikstra’s algorithm
  • Shor’s algorithm
  • Brent’s algorithm
  • A* algorithm
  • Sorting algorithms
  • Sieve of Atkin
  • Runge–Kutta methods
  • Gram–Schmidt process
  • etc.

Learner seems better to me in that it would be interpreted as “an algorithm that ‘learns’ from some data”, and (this is a key bit to me) goes nicely with the package name: LearnAPI as a package for Learners.

Strategy goes in-between those two in my mind.

11 Likes

FWIW, FluxTraining.jl (and FastAI.jl, which builds on it) uses Learner, as do some higher-level Python DL libraries. I think the motivation there is to encapsulate training- and evaluation-related state.

5 Likes

In my mind, the rule of thumb is the following:

Algorithms are verbs and are represented by functions. Data structures are nouns and are represented by structs.

Here are some examples of that pattern in Julia:

Data Algorithm
Vector sort
DataFrame left_join
SimpleGraph dijkstra_shortest_paths
Iterator (e.g. Vector) unique
GLM.LinearModel fit
Matrix *
Matrix factorize
ODEProblem solve

Thus, I don’t think it’s great to put the word Algorithm in the name of a bunch of structs. And, in fact, the word Strategy is awfully close to Algorithm and thus falls by the same argument. Learner seems like a very active concept for a struct which is only a bag of hyperparameters and does not contain any of the data structures that store what the model learns. Thus, the only viable option is Options. :wink:

6 Likes

Good points. For me the main problem with algorithm is that I’d expect it to be deterministic and “converge” which does not fit very well with many machine learning approaches. This is also why i don’t use the term machine learning algorithm. Machine learning method, absolutely.

There exists a strategy to learn but few guarantees are given. This is why I prefer Strategy of the options given.

1 Like

Thanks a lot for the reply. This helps me see your point better, but I disagree with your conclusions. I think there are many examples where the table should rather be

Data Function Algorithm Example
Vector sort ConsiderRadixSort sort(rand(1:10, 10), alg = Base.Sort.ConsiderRadixSort(radix=Base.Sort.RadixSort()))
ODEProblem solve RK4 solve(ode, RK4(step_limiter! = OrdinaryDiffEq.trivial_limiter!)
Function optimize BFGS optimize(f, x0, BFGS(linesearch = LineSearches.HagerZhang())

In all these examples the algorithm is a struct with some options. In some cases there is no reasonable alternative to a default algorithm (e.g. for left_join), in which case it makes sense not to follow this pattern. But I think in machine learning it would make a lot of sense to follow this pattern, because there are always different machine learning algorithms/methods to fit some data.

Err, not really. In these examples the behaviour of the function is affected/configured by the struct. But an algorithm is fundamentally a sequence of instructions. A struct does not do anything, unless it has an ::Expr field it does not itself contain/perform an algorithm. The algorithm itself is performed by a function. These two things are mutually exclusive.

I see your point, thanks! I can understand that machine learning, in particular deep learning, is sometimes perceived as being fancy, blackboxy and unreliable. But with fixed hyperparameters and random seeds, any machine learning algorithm I know is deterministic and converges, at least in the same sense as a local optimization algorithm like BFGS converges to a point near a critical point of some non-convex function. And if e.g. HagerZhang is a line search algorithm, BFGS (which calls the line search deterministically) is an optimization algorithm, I would call RidgeRegression (which calls BFGS deterministically) a machine learning algorithm.

Btw. while writing, a colleague entered the office and we had a brief discussion. He said that the “thing” that defines how the fit function transforms some training data into a model of the data should be called algorithm :).

In conclusion Algorithm or MLMethod remain my favorites.

Ps. another sample from the julia ecosystem: DifferentialEquations calls RK4 etc. “methods”.

Well, yes. BGFS and RK4 is just the name of the algorithm. But isn’t it much more natural to write
sort(x, alg = ConsiderRadixSort()) instead of
sort(x, option = ConsiderRadixSort()) or sort(x, config = ConsiderRadixSort()) or sort(x, strategy = ConsiderRadixSort()).

At least in DifferentialEquations they are not bothered with calling this struct an algorithm

julia> supertype(typeof(RK4()))
OrdinaryDiffEq.OrdinaryDiffEqAdaptiveAlgorithm

I knew I was setting myself up for this objection. :wink:

None of those structs that you mention have the word Algorithm tacked on to the end. They really are just configuration or options structs. I have no problem with a phrase like “the random forest algorithm”. The tricky part is that sometimes we would like to distinguish between the algorithm and the actual ensemble of learned decision trees. So we have both the random forest algorithm and the random forest data structure. If I hear the phrase “random forest”, I think of it as a noun—it’s the ensemble of learned decision trees.

The only reason we have this problem is because we are trying to take a functional approach where we split the input and the output into two different types. We could shift the problem to the output rather than the input. In other words, we could have the following signature for fit:

fit(::RandomForestRegressor, ...) -> RandomForestRegressorModel

Now we have to choose between Model and Fit for the suffix. Unfortunately, RegressorModel seems rather redundant…

I still prefer something like Config and Fit as suffixes. I find it simple, intuitive and unambiguous. For example,

fit(cfg::RandomForestConfig, data...) -> fitted::RandomForestFit

Algorithm, Learner and Strategy seem ambiguous/confusing to me, which this discussion appears to corroborate.

  • What do these terms provide over and above Config? (honest question, I’m just not seeing it)
  • Is the value-add worth the potential confusion that we’re already seeing, even among specialists?
  • Will applied researchers have to consult the docs for the meaning of Algorithm every time they use the package, which may be only sporadically?
  • Does this meaning clash with our understanding of algorithms as procedures/recipes?

Config offers complete specification of both simple and sophisticated model types without loss of semantic precision. Just add data.

1 Like

How about UnfittedModel and FittedModel?

The downside of those is that they’re kind of long when you tack them on as a suffix, e.g.

RandomForestRegressorUnfittedModel

I still kind of like MetaModel :slight_smile: and the fit version could be LearnedModel

Similar to @dilumaluthge 's pairing but I am not sure I like the grammar of Fitted

RandomForestClassifierMetaModel and RandomForestClassifierLearnedModel

vs.

RandomForestClassifierStrategy

I have been following the discussion with interest, but not very closely. The naming convention that we are discussing here is not mandatory though, i.e. if I implement the API, I’m free to name my structs as I wish, right?

Personally, I tend towards not appending too many words.

Does each model class need its own options/config/algorithm struct? How about using the same settings for different (similar) models? Maybe I want to fit 10 different types of classifiers, but share some options between them. Does LearnAPI have a way to deal with this? So maybe the setting’s name does not even have to be tied to a specific model type?

If I correctly understand the current proposal it is not the idea to suffix any word like Algorithm, Strategy or Config to the structs. Isn’t the example struct MyRidge <: LearnAPI.Algorithm quite obvious?

I don’t think that the current proposal suggests to return a XYZModel. Currently it is suggested to return a tuple (fitted_params, state, report). I suggested to wrap it in a struct FitResult. This could be parametric, if desired, e.g. fit(MyRidge, data...) -> FitResult{MyRidge} (or one could call it Model{MyRidge}, if preferred).

Yes, I think so.

I think this is the current suggestion.

Maybe @ablaom can confirm?

That’s true, I guess I’ve been arguing from the point of view of my counter-proposal near the top of this thread. In that counter proposal I propose that fit return an actual type that is documented with a docstring. It seems cleaner to me to return a single object, rather than the tuple (fitted_params, state, report).

Your proposal of returning a FitResult{RandomForestRegressor} is similar in spirit, although as far as I know it’s not possible to attach a docstring to a specific instantiation of a parametric type. On the other hand, maybe I’m thinking too much like an R programmer. In R, object fields are often a documented part of the API, whereas we usually don’t do that in Julia. So, the creator of a random forest library could add a method like trees(::FitResult{RandomForestRegressor}) to extract the ensemble of fitted decision trees, and the trees method would of course have a docstring.

However, as a package developer, I want to have more control over my input and output types. I don’t want to be forced to subtype LearnAPI.Algorithm. To take an example from the ecosystem, the supertype of DataFrame is AbstractDataFrame, not Table. In fact, there is no Tables.Table type. Furthermore, I should be able to return whatever type I want to from fit. I don’t think we should use tuple output types except for the rare cases where it makes intuitive sense, like min, max = extrema(x) or div, rem = divrem(x). If MLJ needs a state object somewhere, then the API for that should look something like the following:

model = fit(options, X, y)
s = state(model)

LearnAPI could provide a default state method like this:

struct Stateless end
state(::Any) = Stateless()

This way most custom model types don’t even need to worry about the state method.

Additionally, the report concept need not enter into LearnAPI. It’s too ill-defined of a concept. Instead, custom model types can implement whichever specific inspection methods make sense for their fitted model type, e.g.

options = RandomForestRegressorOptions()
model = fit(options, X, y)

# Inspect the learned ensemble of decision trees:
trees(model)

# Inspect the out-of-bag predictions:
oob_predictions(model)

# Inspect the max tree depth hyperparameter:
max_tree_depth(model)

My broader concern with LearnAPI is that it is targeted at package developers. I believe we need a model interface that is easy to implement for both package developers and end users. In my day job as a data scientist, I use scikit-learn. I am an end user. I am developing application code, not library code. Yet I still regularly implement my own custom scikit-learn model types—and it is very simple and easy to do. It seems to me that implementing custom model types in LearnAPI (and the similar MLJModelInterface) is more complicated and less intuitive than it is in scikit-learn. It’s not friendly to end users.

3 Likes

I see. This is of course an important point to take into account when designing a new machine learning API!

I sympathize with this. I guess it would be possible to design LearnAPI by only specifying the function signatures and without imposing any type restrictions.

model = fit(options, data...)
predictions = predict(model, kind_of_prediction, data)
update!(model, data)
etc.

But maybe I miss some important advantages of enforcing types. What do you think @ablaom?

Maybe one could use Dynamic Documentation?