I guess I would go with Strategy
if Options
and Config
are not on the table.
Config
or Options
> Strategy
> Learner
> Algorithm
Of the choices given: Strategy
I still think adding and extraword (suffix) to Model
might make reading documentation self evident about which one is just a struct and which one is “the whole object” from which one can make predictions. I haven’t heard anyone complaying that, if in the same Documentation finds any of the following pairs, would be confused by what they are:
-
Model
vsModelHyperParams
-
Model
vsModelStrategy
-
Model
vsModelStruct
-
Model
vsModelConfig
-
Model
vsModelBluePrint
-
Model
vsModelSchema
having two names that define each other (kind of small
and big
) tells you which one is a just a “schema/struct” vs object to use to generate predictions.
Of the proposed ones, for hyperparams struct I would use Strategy
. But still, ModelStrategy
for the struct and Model
for the model instance would make it more evident which one is a model “ready to be used” and which one is a “configuration struct” for the model.
Arguing indefinitely about nomenclature is not super productive but I have to thank @ablaom for trying to get different points of view! At the end of the day is the smart way to build a community that will later use this API.
After having read the current version of LearnAPI.jl I like the term “algorithm” (“a precise step-by-step plan for a computational procedure”).
Some examples that I tried:
# Fitting a neural network classifier
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
params, state, = fit(alg, 0, X, y)
y, = predict(alg, LiteralTarget, params, Xtest)
# continue training
alg.epochs = 10
params, state = update!(alg, 0, state, params, X, y)
# performing PCA
alg = PCA(variance_ratio = 1)
pcs, = fit(alg, 0, X)
scores = transform(alg, pcs, X2)
# hierarchical clustering
alg = HierarchicalClustering(k = 3, linkage = :complete)
_, _, report = fit(alg, 0, X)
plot(report.dendrogram)
report.cutter(h = 4) # returns cluster assignments
# Automatic Hyperparameter Tuning
alg = RidgeRegressor()
tuned_alg = TunedAlgorithm(algorithm = alg,
resampling = CV(nfolds = 10),
range = range(alg, :lambda,
lower = 1e-9, upper = 1e-1))
params, _, report = fit(tuned_alg, 0, X, y)
y, = predict(tuned_alg, LiteralTarget, params, Xtest)
I would also be fine with strategy
(I guess I would replace alg
by strat
in the examples above) or learner
. Everything else, e.g. config
, options
or hyper
would feel a bit awkward to me. Or would you be fine with fit(options, 0, X, y)
?
Some ideas unrelated to the naming question:
- I’m not so happy with how verbosity is treated in MLJ and in the LearnAPI proposal. i.e. I prefer the keyword approach of MLJ, but I would like to have a simple way to change the default verbosity level. For example a function
LearnAPI.default_verbosity(level)
, and keep the signaturefit(alg, X, y; verbosity)
to overrule the default. - I wonder, if it would make sense to define a
FitResult
structure (could also be calledModel
) with fieldsalgorithm, params, state, report
. This would allow for clean and simple signatures forpredict, update!, ingest!, transform
, for example
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
model = fit(alg, X, y)
y = predict(LiteralTarget, model, Xtest)
model.alg.epochs = 10
update!(model, X, y)
alg = PCA()
model = fit(alg, X)
scores = transform(model, X2)
My ranking: Learner
* (because of the package name) > Strategy
> Algorithm
(* two more cents: I don’t quite see the point about generalisation, I’d imagine the point is they learn from data. Isn’t that part of the reasoning of the package name? )
@CameronBieganek, @wc4wc4wc4, @DoktorMike, @tecosaur, @adienes: I am surprised by the popularity of Learner
and Strategy
. When I read about machine learning (say on wikipedia) I see the terms “machine learning algorithm” or “machine learning method” quite often. I mean, what is the common thing about PCA, DBSCAN, Neural Network Classifiers, Lasso, Support Vector Machines, kNN, KMeans, Q-Learning, etc.? They are machine learning algorithms (or machine learning methods). Yes, each algorithm has some knobs to tune, but for me a k-nearest-neighbor classifier with k = 3 or a Lasso with \lambda = 10^{-3} is still a machine learning algorithm; actually, without choosing the hyperparameters the precise computational procedure would not be fully specified, such that kNN without specification of k could be seen as a “machine learning approach” rather than a “machine learning algorithm”.
Therefore, the more I think about it, the more I like Algorithm
for the “struct whose name represents a machine learning algorithm (such as decision tree classifier), and whose fields represent the hyperparameters” (as specified in the gh issue). Or maybe MLMethod
? But Algorithm
feels better to me.
What is it that makes you prefer Strategy
or Learner
?
Algorithm
is much too abstract IMO. An algorithm is just a finite sequence of specific instructions, and when I hear “Algorithm (computer science)” the first things that come to mind to me are:
- Djikstra’s algorithm
- Shor’s algorithm
- Brent’s algorithm
- A* algorithm
- Sorting algorithms
- Sieve of Atkin
- Runge–Kutta methods
- Gram–Schmidt process
- etc.
Learner
seems better to me in that it would be interpreted as “an algorithm that ‘learns’ from some data”, and (this is a key bit to me) goes nicely with the package name: LearnAPI as a package for Learner
s.
Strategy
goes in-between those two in my mind.
FWIW, FluxTraining.jl (and FastAI.jl, which builds on it) uses Learner
, as do some higher-level Python DL libraries. I think the motivation there is to encapsulate training- and evaluation-related state.
In my mind, the rule of thumb is the following:
Algorithms are verbs and are represented by functions. Data structures are nouns and are represented by structs.
Here are some examples of that pattern in Julia:
Data | Algorithm |
---|---|
Vector |
sort |
DataFrame |
left_join |
SimpleGraph |
dijkstra_shortest_paths |
Iterator (e.g. Vector ) |
unique |
GLM.LinearModel |
fit |
Matrix |
* |
Matrix |
factorize |
ODEProblem |
solve |
Thus, I don’t think it’s great to put the word Algorithm
in the name of a bunch of structs. And, in fact, the word Strategy
is awfully close to Algorithm
and thus falls by the same argument. Learner
seems like a very active concept for a struct which is only a bag of hyperparameters and does not contain any of the data structures that store what the model learns. Thus, the only viable option is Options
.
Good points. For me the main problem with algorithm is that I’d expect it to be deterministic and “converge” which does not fit very well with many machine learning approaches. This is also why i don’t use the term machine learning algorithm. Machine learning method, absolutely.
There exists a strategy to learn but few guarantees are given. This is why I prefer Strategy of the options given.
Thanks a lot for the reply. This helps me see your point better, but I disagree with your conclusions. I think there are many examples where the table should rather be
Data | Function | Algorithm | Example |
---|---|---|---|
Vector |
sort |
ConsiderRadixSort |
sort(rand(1:10, 10), alg = Base.Sort.ConsiderRadixSort(radix=Base.Sort.RadixSort())) |
ODEProblem |
solve |
RK4 |
solve(ode, RK4(step_limiter! = OrdinaryDiffEq.trivial_limiter!) |
Function |
optimize |
BFGS |
optimize(f, x0, BFGS(linesearch = LineSearches.HagerZhang()) |
In all these examples the algorithm is a struct
with some options. In some cases there is no reasonable alternative to a default algorithm (e.g. for left_join
), in which case it makes sense not to follow this pattern. But I think in machine learning it would make a lot of sense to follow this pattern, because there are always different machine learning algorithms/methods to fit some data.
Err, not really. In these examples the behaviour of the function is affected/configured by the struct. But an algorithm is fundamentally a sequence of instructions. A struct does not do anything, unless it has an ::Expr
field it does not itself contain/perform an algorithm. The algorithm itself is performed by a function. These two things are mutually exclusive.
I see your point, thanks! I can understand that machine learning, in particular deep learning, is sometimes perceived as being fancy, blackboxy and unreliable. But with fixed hyperparameters and random seeds, any machine learning algorithm I know is deterministic and converges, at least in the same sense as a local optimization algorithm like BFGS converges to a point near a critical point of some non-convex function. And if e.g. HagerZhang
is a line search algorithm, BFGS
(which calls the line search deterministically) is an optimization algorithm, I would call RidgeRegression
(which calls BFGS
deterministically) a machine learning algorithm.
Btw. while writing, a colleague entered the office and we had a brief discussion. He said that the “thing” that defines how the fit
function transforms some training data into a model of the data should be called algorithm :).
In conclusion Algorithm
or MLMethod
remain my favorites.
Ps. another sample from the julia ecosystem: DifferentialEquations calls RK4
etc. “methods”.
Well, yes. BGFS
and RK4
is just the name of the algorithm. But isn’t it much more natural to write
sort(x, alg = ConsiderRadixSort())
instead of
sort(x, option = ConsiderRadixSort())
or sort(x, config = ConsiderRadixSort())
or sort(x, strategy = ConsiderRadixSort())
.
At least in DifferentialEquations
they are not bothered with calling this struct
an algorithm
julia> supertype(typeof(RK4()))
OrdinaryDiffEq.OrdinaryDiffEqAdaptiveAlgorithm
I knew I was setting myself up for this objection.
None of those structs that you mention have the word Algorithm
tacked on to the end. They really are just configuration or options structs. I have no problem with a phrase like “the random forest algorithm”. The tricky part is that sometimes we would like to distinguish between the algorithm and the actual ensemble of learned decision trees. So we have both the random forest algorithm and the random forest data structure. If I hear the phrase “random forest”, I think of it as a noun—it’s the ensemble of learned decision trees.
The only reason we have this problem is because we are trying to take a functional approach where we split the input and the output into two different types. We could shift the problem to the output rather than the input. In other words, we could have the following signature for fit
:
fit(::RandomForestRegressor, ...) -> RandomForestRegressorModel
Now we have to choose between Model
and Fit
for the suffix. Unfortunately, RegressorModel
seems rather redundant…
I still prefer something like Config
and Fit
as suffixes. I find it simple, intuitive and unambiguous. For example,
fit(cfg::RandomForestConfig, data...) -> fitted::RandomForestFit
Algorithm
, Learner
and Strategy
seem ambiguous/confusing to me, which this discussion appears to corroborate.
- What do these terms provide over and above
Config
? (honest question, I’m just not seeing it) - Is the value-add worth the potential confusion that we’re already seeing, even among specialists?
- Will applied researchers have to consult the docs for the meaning of
Algorithm
every time they use the package, which may be only sporadically? - Does this meaning clash with our understanding of algorithms as procedures/recipes?
Config
offers complete specification of both simple and sophisticated model types without loss of semantic precision. Just add data.
How about UnfittedModel
and FittedModel
?
The downside of those is that they’re kind of long when you tack them on as a suffix, e.g.
RandomForestRegressorUnfittedModel