MLJ - Machine Learning in Julia
MLJ is a new flexible framework for composing and tuning supervised and unsupervised learning models, currently scattered in assorted Julia packages, as well as wrapped models from other languages. The MLJ project also seeks to focus efforts in the Julia ML community, and in particular to help inter-operability and maintainability of key ML packages.
The package has been developed primarily at The Alan Turing Institute but enjoys a growing list of advisors and contributors. If you like the project, please star the GitHub repo to boost the prospects of a pending funding review.
Quick links
β Video from London Julia User Group meetup in March 2019 (skip to demo at 21β39)
β Basic Usage and Tour
β Building a self-tuning random forest
β An MLJ docker image (including tour)
β Implementing the MLJ interface for a new model
β How to contribute
Key implemented features
-
Learning networks. Flexible model composition beyond traditional
pipelines (more on this below). -
Automatic tuning. Automated tuning of hyperparameters, including
composite models. Tuning implemented as a model wrapper for
composition with other meta-algorithms. -
Homogeneous model ensembling.
-
Registry for model metadata. Metadata available without loading
model code. Basis of a βtaskβ interface and facilitates
model composition. -
Task interface. Automatically match models to specified learning
tasks, to streamline benchmarking and model selection. -
Clean probabilistic API. Improves support for Bayesian
statistics and probabilistic graphical models. -
Data container agnostic. Present and manipulate data in your
favorite Tables.jl format. -
Universal adoption of categorical data types. Enables model
implementations to properly account for classes seen in training but
not in evaluation.
Some planned enhancements
-
Integrate deep learning packages, such as Flux.jl.
-
Model agnostic gradient descent tuning using automatic
differentiation. -
Enhance support for time series and sparse data.
-
Add support for heterogeneous/distributed architectures.
-
Package common learning network architectures (linear pipelines,
stacks, etc) as simple one-line operations. -
Implement systematic benchmarking for models matching a given task.
-
Automated estimates of cpu and memory requirements for given task/model.
-
Implement DAG style scheduling.
-
Extend and integrate existing loss function libraries to better handle
probabilistic prediction. -
Add interpretable machine learning measures.
-
Add online learning support.
Feedback, and offers of help very welcome!