MLJ - A machine learning toolbox for Julia

MLJ - Machine Learning in Julia

MLJ is a new flexible framework for composing and tuning supervised and unsupervised learning models, currently scattered in assorted Julia packages, as well as wrapped models from other languages. The MLJ project also seeks to focus efforts in the Julia ML community, and in particular to help inter-operability and maintainability of key ML packages.

The package has been developed primarily at The Alan Turing Institute but enjoys a growing list of advisors and contributors. If you like the project, please star the GitHub repo to boost the prospects of a pending funding review.

Quick links

☞ MLJ vs ScikitLearn.jl

☞ Video from London Julia User Group meetup in March 2019 (skip to demo at 21’39) Β 

☞ Basic Usage and Tour

☞ Building a self-tuning random forest

☞ An MLJ docker image (including tour)

☞ Implementing the MLJ interface for a new model

☞ How to contribute

☞ Julia Slack channel: #mlj.

Key implemented features

  • Learning networks. Flexible model composition beyond traditional
    pipelines (more on this below).

  • Automatic tuning. Automated tuning of hyperparameters, including
    composite models. Tuning implemented as a model wrapper for
    composition with other meta-algorithms.

  • Homogeneous model ensembling.

  • Registry for model metadata. Metadata available without loading
    model code. Basis of a β€œtask” interface and facilitates
    model composition.

  • Task interface. Automatically match models to specified learning
    tasks, to streamline benchmarking and model selection.

  • Clean probabilistic API. Improves support for Bayesian
    statistics and probabilistic graphical models.

  • Data container agnostic. Present and manipulate data in your
    favorite Tables.jl format.

  • Universal adoption of categorical data types. Enables model
    implementations to properly account for classes seen in training but
    not in evaluation.

Some planned enhancements

  • Integrate deep learning packages, such as Flux.jl.

  • Model agnostic gradient descent tuning using automatic

  • Enhance support for time series and sparse data.

  • Add support for heterogeneous/distributed architectures.

  • Package common learning network architectures (linear pipelines,
    stacks, etc) as simple one-line operations.

  • Implement systematic benchmarking for models matching a given task.

  • Automated estimates of cpu and memory requirements for given task/model.

  • Implement DAG style scheduling.

  • Extend and integrate existing loss function libraries to better handle
    probabilistic prediction.

  • Add interpretable machine learning measures.

  • Add online learning support.

Feedback, and offers of help very welcome!