A lot of work has been done on MLJ and associated repos since @ablaom’s last announcement some 7 months ago so here’s a brief update:
- MLJ now interfaces with around 100 models including most of ScikitLearn’s,
- the
@pipeline
macro makes it easy for users to define simple sequence of operations and models, - extensive documentation and a new dedicated repository for tutorials: MLJTutorials, which includes end-to-end examples and a port of the Introduction to Statistical Learning’s labs,
- MLJBase now supports many useful metrics for regression and classification; it also makes it easier for package devs to work with multiple tabular data formats (via Tables.jl) and categorical features (via CategoricalArrays.jl).
For devs
MLJ can help you focus on “just” developing models and benefit from MLJ’s machinery for data pre-processing, hyper-parameter tuning, evaluation metrics, etc.
If you have or know of a Julia package that fits the idea of the fit/predict/transform
, please consider adding an interface to MLJ and registering your package with MLJModels. This will allow users to discover, compare and compose (many) models.
Thanks a lot to those who’ve already been helping us out and/or have given us detailed feedback on MLJ (@samuel_okon, @nignatiadis, @juliohm, @cscherrer, @ZacLN, @jpsamaroo, and many others)
Brief roadmap
In future months we will be focusing on
- polishing the user interface: we would like to stabilise the API for MLJ and MLJBase by the end of February and release a 1.0 for both then,
- adding capacity for more sophisticated hyper parameter tuning,
- improving MLJ’s support for distributed & multithreaded computing.
See also the suggested projects section for contributors if you’re interested in helping out (thanks!).
Have a nice weekend