[ANN] LearnAPI.jl - Proposal for a basement-level machine learning API

People might use the term model to refer to ‘abstract model’ and ‘trained model’. For example, according to Microsoft documentation model refers to learned model, but Lux uses model to refer to abstract model + Hyperparams.

I don’t mind Lux approach, using Model to refer to the abstract model (this includes hyperparameters) and ModelArtifact or LearnedModel to refer to the artifact needed to generate predictions (which contains learned parameters, weights, coefficients etc).

Usually MLOps people refer to model as the ‘abstract model that one can instanciate’ and Model artifact to the file/files someone in production needs to productionize a model (which can be folder containing hyperparams, learned parameters and any metadata or information used to get predictions).

I have heard the use of Model to refer to Abstract model as well as Learned model but I have never seen the use of model artifact or learned model to refer to a model that has not been trained.

This seems to naturally yield to using concepts that define each other:

  • Model vs ModelArtifact:

    • Model is the abstract model defined with hyperparams
    • ModelArtifact refers to model + learned params.
  • ModelHyperparams vs Model:

    • ModelHyperparms refer to model hyperparams
    • Model refer to model + learned params.

Could the previous naming convention pairs (if used together in the same context) confuse anyone?

I like the design that @CameronBieganek proposed but I would not use Options, there is a word that is understood by everyone in ML which is HyperParameters or HypeParams.

Following this nomenclature I would have an implementation for ridge as follows:

using LearnAPI
using Tables

struct RidgeRegressorHyperparams
    lambda::Float64
end

RidgeRegressorHyperparams(; lambda=0.1) = RidgeRegressorHyperparams(lambda)

struct RidgeRegressor
    hyperparams::RidgeRegressorHyperparams
    coefs::Vector{Float64}  
    importances::Vector{Pair{Symbol, Float64}}
end

function LearnAPI.fit(Hyperparams::RidgeRegressorHyperparams, X, y; verbosity=0)
    x = Tables.matrix(X)
    s = Tables.schema(X)
    features = s.names

    coefs = (x'x + Hyperparams.lambda*I)\(x'y)

    importances = [features[j] => abs(coefs[j]) for j in eachindex(features)]
    reverse!(sort!(feature_importances, by=last))

    verbosity > 0 && @info "Features in order of importance: $(first.(feature_importances))"

    RidgeRegressor(Hyperparams, coefs, importances)
end

LearnAPI.predict(model::RidgeRegressor, Xnew) = Tables.matrix(Xnew) * model.coefs
LearnAPI.feature_importances(model::RidgeRegressor) = model.importances
2 Likes

StructArray(a=eachcol(A), b=[...], c=[...]) should represent this just fine. Views of such a structarray would preserve the structure.
This could be a performance optimization: if a component is Slices, use optimized approach. At the same time, passing a StructArray or a regular array with one Vector for each element would work the same, just less efficiently. That’s convenient when you don’t want to assemble all as into a single matrix manually.

@aplavin Perhaps you meant to post this in another thread? I’m not understanding the relevance here.

It’s a direct answer to the @ExpandingMan comment above, see the quotation.

1 Like

Right, my mistake!

@ExandingMan (and others). Just a quick acknowledgement of your LearnAPI.jl review which is most helpful and well-informed. I am busy with other things but rest assured this and other feedback are marinating at the back of my mind and I will return to this in due course.

6 Likes

I’ve made a few changes to LearnAPI in response to some of the discussion:

  • “model” structs are now called “algorithms”
  • predict is now dispatched on the kind of target proxy (LiteralTarget, Distribution, SurvivalFunction, etc.)

Please see this PR for details.

Just 2c from me: I feel like “algorithms” isn’t a great fit, to me at least that’s more about the core mechanism at play than the configuration of it. Regarding the comment you made:

After further thought (and I got as far as writing a new PR) I am less fond of “learner”. The problem is that plenty of machine learning algorithms don’t really learn (don’t generalize to new data).

I think there’s another aspect to this worth considering, the fact that the proposed package name is “LearnAPI”. In that respect “Learner” appears to be quite a good fit. I suppose you could switch to “AlgorithmsAPI” … but that seems unhelpfully broad.

1 Like

I also dislike the term algorithm. I shared a comment in this thread explaining that a learning algorithm is a procedure like maximum likelihood estimation (MLE) that is used to learn the parameters of a learning model. These terms are well established in classical machine learning textbooks.

5 Likes

@juliohm I don’t understand how the use of “algorithm” in the updated docs is inconsistent with this view. It is, in fact, more inclusive, because there are algorithms in machine learning where there is no explicit statistical model (eg, DBSCAN clustering) but they still need a struct storing configuration parameters. As I see it, the change perfectly addresses the earlier objection to the use of “model” in the context of configuring an algorithm which is not itself a statistical model, only a means for learning one (or performing some other ML task).

@tecosaur Yes, I do agree: “Algorithm” sounds more about function than structure, and this is a compromise. Also, you don’t’ just choose a configuration. You choose an algorithm and a configuration of the algorithm. Ideally the word (or short composite word) should encapsulate both ideas. But it shouldn’t be a mouthful, or pose grammatical awkwardness in documentation. Nothing suggested so far meets all these criteria, as far as I can tell.

There have been opportunities for people to make alternative suggestions for the name of the configuration struct, here and in this issue. With all due respect for the suggestions and engagement on this question, the only names I care to entertain now are “algorithm”, “strategy” or “learner”. (As noted in the issue, not all algorithms learn, ie generalize, so that is a minor drawback of the latter name).

What do people prefer, “algorithm”, “strategy” or “learner”? (edited)

If I can get 4 or more votes, I’m happy to go with the majority.

2 Likes

I know this is not one of your preffered choices, but I still prefer Options. If I had to choose between Algorithm and Learner, I guess I would go with Algorithm.

2 Likes

Of the two, I certainly prefer Learner. I must admit I do not relate to the concern that some instances will not be generalizing to new data—after all, I might “learn” a new vocab word every day, but that won’t necessarily help me generalize to knowing more words. I do relate to the concern that Algorithm is a pretty unintuitive term for a struct of configuration parameters. I usually think of an algorithm as having more to do with the actual implementation details of a particular learning process, and the hyperparameters define the search space for that learner.

That being said, I can also relate to any concern that Learner might not be the most phonetically pleasing word.

With all due respect for the suggestions and engagement on this question, the only names I care to entertain now

Well, ok :slight_smile: . If your mind ever changes on this, I suspect there is a word that can “encapsulate both algorithm and configuration” lurking behind a bit more brainstorm. Some unfiltered, possibly goofy, ideas:

  • Metamodel or Protomodel
  • Schema or Modelschema
  • Learnspace
  • Blueprint
  • Strategy as suggested before
  • There’s always Confalgurithm :stuck_out_tongue:
5 Likes

The word “configuration” has come up quite a few times in this thread. The abbrevation config is very commonly used in software development. So how about we just use the suffix Config for these structs? E.g. RandomForestRegressorConfig, SupportVectorClassifierConfig, etc.

Then, if desired, the documentation could refer to such an object as the “model configuration”.

6 Likes

@adienes I’d forgotten about my earlier proposal of “strategy” and have added that to the choices above.

1 Like

In that case, I think my ballot would read Strategy ≻ Learner ≻ Algorithm in that order

1 Like

I guess I would go with Strategy if Options and Config are not on the table. :stuck_out_tongue:

Config or Options > Strategy > Learner > Algorithm

2 Likes

Of the choices given: Strategy

2 Likes

I still think adding and extraword (suffix) to Model might make reading documentation self evident about which one is just a struct and which one is “the whole object” from which one can make predictions. I haven’t heard anyone complaying that, if in the same Documentation finds any of the following pairs, would be confused by what they are:

  • Model vs ModelHyperParams
  • Model vs ModelStrategy
  • Model vs ModelStruct
  • Model vs ModelConfig
  • Model vs ModelBluePrint
  • Model vs ModelSchema

having two names that define each other (kind of small and big) tells you which one is a just a “schema/struct” vs object to use to generate predictions.

Of the proposed ones, for hyperparams struct I would use Strategy. But still, ModelStrategy for the struct and Model for the model instance would make it more evident which one is a model “ready to be used” and which one is a “configuration struct” for the model.

Arguing indefinitely about nomenclature is not super productive but I have to thank @ablaom for trying to get different points of view! At the end of the day is the smart way to build a community that will later use this API.

3 Likes

After having read the current version of LearnAPI.jl I like the term “algorithm” (“a precise step-by-step plan for a computational procedure”).

Some examples that I tried:

# Fitting a neural network classifier
alg = NeuralNetworkClassifier(builder = Short(n_hidden = 10), epochs = 3)
params, state, = fit(alg, 0, X, y)
y, = predict(alg, LiteralTarget, params, Xtest)
# continue training
alg.epochs = 10
params, state = update!(alg, 0, state, params, X, y)

# performing PCA
alg = PCA(variance_ratio = 1)
pcs, = fit(alg, 0, X)
scores = transform(alg, pcs, X2)

# hierarchical clustering
alg = HierarchicalClustering(k = 3, linkage = :complete)
_, _, report = fit(alg, 0, X)
plot(report.dendrogram)
report.cutter(h = 4) # returns cluster assignments

# Automatic Hyperparameter Tuning
alg = RidgeRegressor()
tuned_alg = TunedAlgorithm(algorithm = alg,
                           resampling = CV(nfolds = 10),
                           range = range(alg, :lambda,
                                         lower = 1e-9, upper = 1e-1))
params, _, report = fit(tuned_alg, 0, X, y)
y, = predict(tuned_alg, LiteralTarget, params, Xtest)

I would also be fine with strategy (I guess I would replace alg by strat in the examples above) or learner. Everything else, e.g. config, options or hyper would feel a bit awkward to me. Or would you be fine with fit(options, 0, X, y)?

1 Like