MLJ now interfaces with around 100 models including most of ScikitLearn’s,
the @pipeline macro makes it easy for users to define simple sequence of operations and models,
extensive documentation and a new dedicated repository for tutorials: MLJTutorials, which includes end-to-end examples and a port of the Introduction to Statistical Learning’s labs,
MLJBase now supports many useful metrics for regression and classification; it also makes it easier for package devs to work with multiple tabular data formats (via Tables.jl) and categorical features (via CategoricalArrays.jl).
For devs
MLJ can help you focus on “just” developing models and benefit from MLJ’s machinery for data pre-processing, hyper-parameter tuning, evaluation metrics, etc.
If you have or know of a Julia package that fits the idea of the fit/predict/transform, please consider adding an interface to MLJ and registering your package with MLJModels. This will allow users to discover, compare and compose (many) models.
Thanks a lot to those who’ve already been helping us out and/or have given us detailed feedback on MLJ (@samuel_okon, @nignatiadis, @juliohm, @cscherrer, @ZacLN, @jpsamaroo, and many others)
Brief roadmap
In future months we will be focusing on
polishing the user interface: we would like to stabilise the API for MLJ and MLJBase by the end of February and release a 1.0 for both then,
adding capacity for more sophisticated hyper parameter tuning,
improving MLJ’s support for distributed & multithreaded computing.
See also the suggested projects section for contributors if you’re interested in helping out (thanks!).
Well anything more sophisticated than grid or basic random search (which we support). For instance we would like to try using/interfacing with Hyperopt.jl (which supports BO, fancier random search, etc). I’m not familiar with R’s DiceOptim (thanks for the link).
It would also be nice is to try to leverage appropriate tools from JuliaOpt to help for such things too.
On the streaming side, interop with OnlineStats is considered (maybe @ablaom can comment on this).
With respect to distributed/out-of-core, MLJ is built around compatibility with the Tables.jl interface; it works with JuliaDB for instance, and it would work with anything else that corresponds to Tables.jl
Maybe it’s useful to stress that MLJ is effectively “just” a tool to use and compose other models, some are offered by MLJ but most are external. In the interface to a given package, MLJ expects a Table object by design but then the implementation of the fit function is free to do whatever it wants so that the data is in a form that can be processed by the model.
To facilitate this, we offer functions that facilitate conversion of the data if required by the package and, for instance, to matrices with or without transposition depending on what’s required.
To exemplify this, a number of algorithms in the Stats environment expect a p x nmatrix as input convention; the MLJ interface can provide this as an adjoint view (the convention in MLJ is n x p); in the unfortunate case where the algorithm does not support adjoints, then a copy is needed but, fingers crossed, this will slowly be updated in MultivariateStats and the like so that we don’t have to copy data around too much if not required.