[ANN] LearnAPI.jl - Proposal for a basement-level machine learning API

I guess I would go with Strategy if Options and Config are not on the table. :stuck_out_tongue:

Config or Options > Strategy > Learner > Algorithm

2 Likes

Of the choices given: Strategy

2 Likes

I still think adding and extraword (suffix) to Model might make reading documentation self evident about which one is just a struct and which one is “the whole object” from which one can make predictions. I haven’t heard anyone complaying that, if in the same Documentation finds any of the following pairs, would be confused by what they are:

  • Model vs ModelHyperParams
  • Model vs ModelStrategy
  • Model vs ModelStruct
  • Model vs ModelConfig
  • Model vs ModelBluePrint
  • Model vs ModelSchema

having two names that define each other (kind of small and big) tells you which one is a just a “schema/struct” vs object to use to generate predictions.

Of the proposed ones, for hyperparams struct I would use Strategy. But still, ModelStrategy for the struct and Model for the model instance would make it more evident which one is a model “ready to be used” and which one is a “configuration struct” for the model.

Arguing indefinitely about nomenclature is not super productive but I have to thank @ablaom for trying to get different points of view! At the end of the day is the smart way to build a community that will later use this API.

3 Likes

After having read the current version of LearnAPI.jl I like the term “algorithm” (“a precise step-by-step plan for a computational procedure”).

Some examples that I tried:

# Fitting a neural network classifier
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
params, state, = fit(alg, 0, X, y)
y, = predict(alg, LiteralTarget, params, Xtest)
# continue training
alg.epochs = 10
params, state = update!(alg, 0, state, params, X, y)

# performing PCA
alg = PCA(variance_ratio = 1)
pcs, = fit(alg, 0, X)
scores = transform(alg, pcs, X2)

# hierarchical clustering
alg = HierarchicalClustering(k = 3, linkage = :complete)
_, _, report = fit(alg, 0, X)
plot(report.dendrogram)
report.cutter(h = 4) # returns cluster assignments

# Automatic Hyperparameter Tuning
alg = RidgeRegressor()
tuned_alg = TunedAlgorithm(algorithm = alg,
                           resampling = CV(nfolds = 10),
                           range = range(alg, :lambda,
                                         lower = 1e-9, upper = 1e-1))
params, _, report = fit(tuned_alg, 0, X, y)
y, = predict(tuned_alg, LiteralTarget, params, Xtest)

I would also be fine with strategy (I guess I would replace alg by strat in the examples above) or learner. Everything else, e.g. config, options or hyper would feel a bit awkward to me. Or would you be fine with fit(options, 0, X, y)?

1 Like

Some ideas unrelated to the naming question:

  1. I’m not so happy with how verbosity is treated in MLJ and in the LearnAPI proposal. i.e. I prefer the keyword approach of MLJ, but I would like to have a simple way to change the default verbosity level. For example a function LearnAPI.default_verbosity(level), and keep the signature fit(alg, X, y; verbosity) to overrule the default.
  2. I wonder, if it would make sense to define a FitResult structure (could also be called Model) with fields algorithm, params, state, report. This would allow for clean and simple signatures for predict, update!, ingest!, transform, for example
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
model = fit(alg, X, y)
y = predict(LiteralTarget, model, Xtest)
model.alg.epochs = 10
update!(model, X, y)

alg = PCA()
model = fit(alg, X)
scores = transform(model, X2)
1 Like

My ranking: Learner* (because of the package name) > Strategy > Algorithm

(* two more cents: I don’t quite see the point about generalisation, I’d imagine the point is they learn from data. Isn’t that part of the reasoning of the package name? )

1 Like

@CameronBieganek, @wc4wc4wc4, @DoktorMike, @tecosaur, @adienes: I am surprised by the popularity of Learner and Strategy. When I read about machine learning (say on wikipedia) I see the terms “machine learning algorithm” or “machine learning method” quite often. I mean, what is the common thing about PCA, DBSCAN, Neural Network Classifiers, Lasso, Support Vector Machines, kNN, KMeans, Q-Learning, etc.? They are machine learning algorithms (or machine learning methods). Yes, each algorithm has some knobs to tune, but for me a k-nearest-neighbor classifier with k = 3 or a Lasso with \lambda = 10^{-3} is still a machine learning algorithm; actually, without choosing the hyperparameters the precise computational procedure would not be fully specified, such that kNN without specification of k could be seen as a “machine learning approach” rather than a “machine learning algorithm”.
Therefore, the more I think about it, the more I like Algorithm for the “struct whose name represents a machine learning algorithm (such as decision tree classifier), and whose fields represent the hyperparameters” (as specified in the gh issue). Or maybe MLMethod? But Algorithm feels better to me.
What is it that makes you prefer Strategy or Learner?

3 Likes

Algorithm is much too abstract IMO. An algorithm is just a finite sequence of specific instructions, and when I hear “Algorithm (computer science)” the first things that come to mind to me are:

  • Djikstra’s algorithm
  • Shor’s algorithm
  • Brent’s algorithm
  • A* algorithm
  • Sorting algorithms
  • Sieve of Atkin
  • Runge–Kutta methods
  • Gram–Schmidt process
  • etc.

Learner seems better to me in that it would be interpreted as “an algorithm that ‘learns’ from some data”, and (this is a key bit to me) goes nicely with the package name: LearnAPI as a package for Learners.

Strategy goes in-between those two in my mind.

10 Likes

FWIW, FluxTraining.jl (and FastAI.jl, which builds on it) uses Learner, as do some higher-level Python DL libraries. I think the motivation there is to encapsulate training- and evaluation-related state.

5 Likes

In my mind, the rule of thumb is the following:

Algorithms are verbs and are represented by functions. Data structures are nouns and are represented by structs.

Here are some examples of that pattern in Julia:

Data Algorithm
Vector sort
DataFrame left_join
SimpleGraph dijkstra_shortest_paths
Iterator (e.g. Vector) unique
GLM.LinearModel fit
Matrix *
Matrix factorize
ODEProblem solve

Thus, I don’t think it’s great to put the word Algorithm in the name of a bunch of structs. And, in fact, the word Strategy is awfully close to Algorithm and thus falls by the same argument. Learner seems like a very active concept for a struct which is only a bag of hyperparameters and does not contain any of the data structures that store what the model learns. Thus, the only viable option is Options. :wink:

5 Likes

Good points. For me the main problem with algorithm is that I’d expect it to be deterministic and “converge” which does not fit very well with many machine learning approaches. This is also why i don’t use the term machine learning algorithm. Machine learning method, absolutely.

There exists a strategy to learn but few guarantees are given. This is why I prefer Strategy of the options given.

1 Like

Thanks a lot for the reply. This helps me see your point better, but I disagree with your conclusions. I think there are many examples where the table should rather be

Data Function Algorithm Example
Vector sort ConsiderRadixSort sort(rand(1:10, 10), alg = Base.Sort.ConsiderRadixSort(radix=Base.Sort.RadixSort()))
ODEProblem solve RK4 solve(ode, RK4(step_limiter! = OrdinaryDiffEq.trivial_limiter!)
Function optimize BFGS optimize(f, x0, BFGS(linesearch = LineSearches.HagerZhang())

In all these examples the algorithm is a struct with some options. In some cases there is no reasonable alternative to a default algorithm (e.g. for left_join), in which case it makes sense not to follow this pattern. But I think in machine learning it would make a lot of sense to follow this pattern, because there are always different machine learning algorithms/methods to fit some data.

Err, not really. In these examples the behaviour of the function is affected/configured by the struct. But an algorithm is fundamentally a sequence of instructions. A struct does not do anything, unless it has an ::Expr field it does not itself contain/perform an algorithm. The algorithm itself is performed by a function. These two things are mutually exclusive.

I see your point, thanks! I can understand that machine learning, in particular deep learning, is sometimes perceived as being fancy, blackboxy and unreliable. But with fixed hyperparameters and random seeds, any machine learning algorithm I know is deterministic and converges, at least in the same sense as a local optimization algorithm like BFGS converges to a point near a critical point of some non-convex function. And if e.g. HagerZhang is a line search algorithm, BFGS (which calls the line search deterministically) is an optimization algorithm, I would call RidgeRegression (which calls BFGS deterministically) a machine learning algorithm.

Btw. while writing, a colleague entered the office and we had a brief discussion. He said that the “thing” that defines how the fit function transforms some training data into a model of the data should be called algorithm :).

In conclusion Algorithm or MLMethod remain my favorites.

Ps. another sample from the julia ecosystem: DifferentialEquations calls RK4 etc. “methods”.

Well, yes. BGFS and RK4 is just the name of the algorithm. But isn’t it much more natural to write
sort(x, alg = ConsiderRadixSort()) instead of
sort(x, option = ConsiderRadixSort()) or sort(x, config = ConsiderRadixSort()) or sort(x, strategy = ConsiderRadixSort()).

At least in DifferentialEquations they are not bothered with calling this struct an algorithm

julia> supertype(typeof(RK4()))
OrdinaryDiffEq.OrdinaryDiffEqAdaptiveAlgorithm

I knew I was setting myself up for this objection. :wink:

None of those structs that you mention have the word Algorithm tacked on to the end. They really are just configuration or options structs. I have no problem with a phrase like “the random forest algorithm”. The tricky part is that sometimes we would like to distinguish between the algorithm and the actual ensemble of learned decision trees. So we have both the random forest algorithm and the random forest data structure. If I hear the phrase “random forest”, I think of it as a noun—it’s the ensemble of learned decision trees.

The only reason we have this problem is because we are trying to take a functional approach where we split the input and the output into two different types. We could shift the problem to the output rather than the input. In other words, we could have the following signature for fit:

fit(::RandomForestRegressor, ...) -> RandomForestRegressorModel

Now we have to choose between Model and Fit for the suffix. Unfortunately, RegressorModel seems rather redundant…

I still prefer something like Config and Fit as suffixes. I find it simple, intuitive and unambiguous. For example,

fit(cfg::RandomForestConfig, data...) -> fitted::RandomForestFit

Algorithm, Learner and Strategy seem ambiguous/confusing to me, which this discussion appears to corroborate.

  • What do these terms provide over and above Config? (honest question, I’m just not seeing it)
  • Is the value-add worth the potential confusion that we’re already seeing, even among specialists?
  • Will applied researchers have to consult the docs for the meaning of Algorithm every time they use the package, which may be only sporadically?
  • Does this meaning clash with our understanding of algorithms as procedures/recipes?

Config offers complete specification of both simple and sophisticated model types without loss of semantic precision. Just add data.

1 Like

How about UnfittedModel and FittedModel?

The downside of those is that they’re kind of long when you tack them on as a suffix, e.g.

RandomForestRegressorUnfittedModel