Pros and cons of using Flux directly over using MLJ/MLJFlux

The following question has been posted on slack:

“Can anyone guide me on the relative merits of using MLJ with MLJFlux vs. using Flux directly?”


First note that there are alternatives for interacting with Flux, apart from MLJ, that are under development. In particular, there is the FastAI.jl project. I am not so familiar with that project (cc @ToucheSir) but will point out one distinction: The FastAI project is focused wrapping functionality around Flux.jl models, ie, around so-called deep learning models. This generally means neural networks, but in fact includes any model defined by a differentiable function that is trained using some variation of gradient descent, a large and useful class. On the other hand, MLJ is a multi-paradigm machine learning toolbox that attempts to add functionality to a much wider class of models: tree-based models, support vector machines, gaussian mixture models, nearest neighbour models, probabilistic programming models, etc. If you believe your problem can be solved using deep learning models alone (likely if it involves images, possibly if it involves language or audio) then this project may ultimately have an advantage over MLJ, where the goals are more broad.

Returning to the original question, Flux provides the bare bones for building and deep learning models. So the advantage of using Flux directly is that you have maximum flexibility. In principle, there’s not much in deep learning that you cannot implement using Flux.jl alone. In particular, you can do things like re-informement learning and adversarial learning that you could not do through MLJFlux. You will also have control over certain decisions about data representation that possibly make a difference if you are working with very large data sets. Currently, MLJFlux does not really support sparse data, for example (although this is on our radar).

However, machine learning is more than just building and training a single model. For example, for ordinary supervised learning tasks, you will generally want to do one or more of the following:

  • estimate performance of your model using a holdout set or other resampling strategy (eg, cross-validation) as measured by one or more metrics (loss functions) that may not have been used in training

  • optimise hyper-parameters such as a regularisation parameter (eg, dropout) or a width/height/nchannnels of convolution layer

  • control iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • compose with other models such as introducing data pre-processing steps (eg missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”)

  • compare your model with a non-deep learning models

You can basically do all this right now, with MLJ/MLJFlux, without writing much extra code yourself, which is the main advantage for doing so. If you are already familiar with MLJ workflows, this is an attractive option for run-of-the-mill supervised learning tasks. If you are not familiar with MLJ, then there is some time required to learn but substantial resources available. One also must spend a little time understanding how a MLJFlux builds a Flux model after inspecting your data (the “builder” idea discussed in the documentation).

Even if you are using MLJ/MJFlux, you will need to be familiar with the Flux API so you can define an appropriate MLJFlux “builder”. Probably most people will find it helpful to make their first project a pure Flux.jl project to be sure they understand the principles and get a good mental model for what MLJFlux is doing under the hood.


I’ll have to punt a bit here and say that comparing MLJ(Flux), FastAI.jl and plain Flux on a holistic level is daunting task. Instead of doing so, I’ll provide some guiding considerations and my opinions on which framework fits well with each.

  1. Are you trying to maximize re-usability/fitting your model into a larger pipeline? MLJ really shines here because that’s part of it’s raison d’être. Flux is purposefully unopinionated on this front.

  2. Are you trying to accomplish a common deep learning task with best practices/conventions? This is really where FastAI shines. The benefit of having a prescriptive framework for the algorithmic side of things is that one can start with a good baseline and gradually tweak it instead of having to research architecture, hyperparameters and training tricks at the beginning. That is not to say the framework is inflexible: one of the goals is to avoid the trap Python fell into and limit extensibility for more advanced users.

  3. Are you trying to learn about DL or have a non-standard training loop? This is where Flux on its own is indispensable, as I feel frameworks have collectively overfit on a single-task supervised xor unsupervised training loop. Frankly, I’m not even sure if there’s a one-size-fits all model for ResNet-on-CIFAR supervised learning, GAN training, self-supervised shenanigans (both CV à la SimCLR and text à la BERT) and whatever crazy tricks du jour people are using to train massive language models.