I’ve had a quick read of the proposal, and a bit of the current interface code.
Overall, it looks brilliant @ablaom, so glad you’ve taken the time to continue this effort. I’ve only got a few niggles that I’d like to raise:
Cameron’s minimise function I think could be better named something like trim, shrink, or even downsize or deflate. I think that’s more effective at conveying the intent. Otherwise I’m prone to thinking of model error minimisation, and minimising objectives
In src/types.jl, I see abstract Finite, Iterable, and FiniteIterable types. These seem like interface-kind types and could perhaps be better served by boolean trait-functions (e.g. Tables.istable) or holy-style trait functions?
I do still think that as a package named LeanAPI for Machine Learning models, calling the objects that “learn” from data “learners” rather than “algorithms” is more apt (regardless of how well they generalise), but it seems like this mostly affects the docs now, and I don’t mean to re-commence the bikeshedding.
The name for the function that maps learning outcomes (output of fit) to something suitable for serialisation. At time of writing, this is minimise in the docs :
Okay, here’s one more. In LearnAPI I currently have inverse_transform, “broadly” understood. I’ve currently said this any right inverse or approximate right-inverse for transform (maybe any one-sided inverse should be allowed??). This is the same name used in scikit-learn and MLJ, and so it has a lot of intertia for me personally. No promises to change it, but I’d like to know what people think of the name.
I think StatsAPI uses reconstruct and TableTransforms uses revert (transform is apply). Any other name suggestions or comments before I poll this?
I preferred the way it was done with TransformedTargetModel where it would be a regular algorithm with fit() and predict() and the parameters to that algorithm would include the inner regression algorithm along with the transformer function and inverse functions. In which case they are just names of algorithm parameters, and maybe not needed in the api.
Also on the minimize() proposed function, some of the suggested names suggest the trained model is mutated. If the model is an immutable struc, instead of minimize(), maybe it could just be another accessor function to get the struct for the trained algorithm, which represents the information required by predict().
The trait LearnAPI.functions returns a list of functions that can either be applied to the algorithm struct (e.g, fit), or the output of fit (e.g. predict). See here. Should this be called methods instead?