What is the relation between MLJ and Flux?

According to the sites:

MLJ is an open-source machine learning toolbox written in pure Julia. It provides a uniform interface for interacting with supervised and unsupervised learning models currently scattered in different Julia packages.
Enhancements planned for the near future include integration of Flux.jl deep learning models, and gradient descent tuning of continuous hyperparameters using automatic differentiation.

Flux is a library for machine learning. It comes “batteries-included” with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

It sounds like Flux and MLJ are complementary somehow.

2 Likes

MLJ does not inherently provide models, whereas Flux does. MLJ focuses on making models provided by other packages available and composable with a unified syntax.

It makes sense to have an interface that allows Flux to be called from MLJ (this was the idea behind MLJFlux) while the opposite makes less sense.

A poor analogy is to say that Flux is to MLJ as Tensorflow is to Sklearn though of course Sklearn provides and maintains a lot of models by itself which is not MLJ’s ambition.

Finally, I think it’s fair to say that Flux is focused around deep learning (neural nets) and even though it can do other things, it’s not really the point, whereas MLJ, in theory, allows you to work with any model that would match the fit-predict/transform idea.

Hope this helps

3 Likes

Short:
MLJ basically let’s you access machine learning models in different packages using a common interace. Flux let’s you build models using low level tracking of gradients.

Long:
Basically, if you don’t mind learning another packages syntax, there isn’t much of a point to using MLJ. That being said it is nice in that it tries to organize some disjoint collections of modelling packages for ease of use. In theory, I believe MLJ hopes to allow for large swaths of automated modelling routines to be “Easily” built, as it currently stands I don’t know if the syntax allows for that without metaprogramming. Whereas other modelling packages maybe more easily could by choosing to use only inhouse tools. Future looks good for MLJ, but right now it’s a wrapper for disperate efforts with a noble dream. MLJ has interop. with things from outside of Julia such as the infamous SciKit Learn in python, whether that’s good or bad is TBD IMO.

I wonder if they’ll ever wrap any of the methods I have in my package? I think it’s mostly political how what becomes included is decided.

Flux is a cutting edge, often astable, pure Julia autodiff library. It’s extremely performant, flexible and generic syntax allows you to perform numerical calculations that depend on derivative optimizations easily of Julia’s Zygote library(is this true as of 0.1.0?). One notices that a popular use case for these calculations is “neural networks”, but that isn’t a limitation of Flux despite that it provides a lot of convenience functions for building them. Crazy things are easily possible in Flux, and access to GPUArrays is facile, so training huge NN’s on huge GPU’s or even many GPU’s is trivial compared to its competitors.

The relationship between the two is that you can use either of them to solve data driven inquiries. Sometimes you can use both of them (provided they are both stable enough).

I’ll assume that’s not your intent but I perceive your message as being borderline inflammatory. I’ll just address this statement

which is wrong.

We don’t have the resources to write and maintain our own interfaces to all the packages out there, which is why we encourage package developers to write their own interface to MLJ. Provided the interface meets the examples in MLJModels, we will (eventually) add it the package to the MLJ registry, once we’ve had a chance to test it. There’s currently three such efforts ongoing: one to JLBoost, one to EvoTrees and one to MulticlassPerceptron.

Most of the interfaces that are currently maintained in MLJModels such as the one to NearestNeighbors.jl or, indeed, to Sklearn are simply to the most popular libraries in the Julia ML ecosystem which users would assume to be available.

7 Likes

Not intended to be inflammatory, sorry if it came off as abrasive. That’s just realistically how I view the two packages at the current state. I could try to write an interface to MLJ it’s just real hard to understand how it’s set up, or whats changing, when, where and why. I’m also not sure I like all the design decisions in it, and wouldn’t want to even try to convince the maintainers otherwise because it looks on the outset to be very interested in being an academic credibility effort (nothing wrong with that, but those things typically aren’t inclusive to outsiders).

It’s really hard to please everybody, unfortunately. Every field of ML tends to have a different perspective. Having an imperfect common interface to easily switch from one library to the next is amazingly convenient, and it’s the main reason for sklearn’s success. Furthermore, supporting MLJBase doesn’t prevent you from offering another interface, that more closely matches your algorithm’s particularities.

1 Like

I always thought SKLearns success was having everything in one spot and having many people directly contributing to that single interface. Someone made a new fancy shmancy algorithm, someone would add it to SKLearn. So end users didn’t have to wait for bridges from individual efforts across languages, or packages, etc. The reason why python has hooks to C/C++ is not because it was convenient really, but because python is way to slow to perform those modelling operations. Julia isn’t slow we don’t need to have Julia call Python to Call C/C++ to callback to python to callback to Julia…

Truly though, SKLearn made a lot of errors along the way in my opinion. It’s great if you want to do mindless ML, big old grid searches on svms or xgboost the universe, but it’s pretty flawed for a lot of real work. I wouldn’t use SKLearn as a model for success, not to imply that you all are. I would use it as a model for what can easily be improved rather then imported. Rule of thumb is people usually don’t switch technologies unless they are 10x better in atleast one aspect.

1 Like

A bit of an aside: you have this slightly backwards (or at least spoke unclearly):

  • Zygote is a pure Julia AutoDiff library
  • Flux is a neural network library that provides neural network layers and optimizers, and it depends upon Zygote for autodiff, (like it depends on GPUArrays for GPU)
1 Like

Yea I wasn’t semantically secure. Flux is a wrapper for zygote which does autodiff but has a flavor of tools which make it convenient to build neural networks (amongst other things), but importing flux will also perform autodiff(albeit through zygote).

that’s what I get for multitasking :stuck_out_tongue:

Having re-read what I wrote, I realize I sound a bit like an insert expletive. I guess to me it’s just hard, because what the MLJ team does with all the clout and hype behind it dictates a lot of what happens in the early days of when Julia ML kicks off more main stream.

Right now all progress made in Julia in this field is in the hands of people who know what they are doing. Wide adoption will feature a very different audience… Regardless of whether a better solution was proposed or not at this point, MLJ will win out politically for a long time. About a year ago someone was telling me how amazing MLJ was going to be, but I looked at it, tried using it, and had a very different view, and the state of it seems much the same despite a lot of progress. If MLJ treads a dead end path, corners itself into an area that’s difficult to scale/maintain, or is very cludgy and inflexible for advanced work, it’s going to make Julia look real bad to lay people(the people who make buissiness decisions), and be a stain for people who know what they are doing. It could come off as a cheap clone of something that already exists (python, R) - which it isn’t.

Now I realize these are all hypotheticals, and unsupportive opinions… So yea in short I care too much, you all are doing fine, shoulda kept my mouth shut, carry on.

I’m interested in hearing where you think sklearn went wrong. I see it as a very useful collection of tools, which are great for taking an initial stab at a problem. I end up doing custom work whenever the problem demands it, but that seems unavoidable. There’s no magical algorithm that solves all math problems. One has to exploit each problem’s special structure.

I will write you a list once I finish setting up this GPU box…

The thing that we did right with DifferentialEquations.jl is we used multiple dispatch to build a system where anyone could add algorithms. It’s all described here:

MLJ is building a very similar system for machine learning. Of course they have to write the first however many packages and wrappers for the system, because without it being useful there won’t be buy-in. However, at this point, if you create a fancy new machine learning algorithm and want MLJ users to use it, you can fairly easily add a few lines to then allow people to call your algorithm as part of an ensemble.

SciKitLearn on the otherhand is very top-down, where the algorithms that can be used are the ones that are in the blessed repo. That paper describes some of the advantages we’ve seen from a confederated system:

  1. Original authors more readily adopt the system because they can keep the package and get academic credit. Overtime we’ve seen many of these migrate over to a standardized organization for helping with maintenance, but that’s not necessarily all of the cases.

  2. There’s multitudes of the same implementation allowed, which allows for different performance characteristics and a nice system for benchmarking for research.

  3. Even if people act adverse to the whole world and write a new cool method/implementation in a way that is incompatible, you can just make a new package that slaps the common interface on it and now it’s usable from the system while not requiring code to move to a new repo. Easy peezy.

  4. Being confederated, if some people don’t agree with a certain code structure, that’s fine. You can work in different repos in a way that doesn’t effect users.

In total, I think there’s a lot of advantages to this approach, and am glad MLJ has gone down this route.

Back to the original question, Flux is a deep learning library so it’s completely different.

12 Likes

Truth be told, it’s very easy to implement algorithms in scikit-learn and benefit from a lot of code form the library (pipelines, grid search etc). You just need to inherit from a BaseEstimator, code a fit and a predict and you are good to go. IMHO, the merit of MLJ is putting into the package the scientific type abstraction and the graph-like lego model that will be pretty cool (and powerfull) once the other parts are more mature.

Probably this is not the post to discuss this but… wouldn’t it be usefull to have a post discussing what we miss in other frameworks? (why weren’t pandas, scikit-learn, tensorflow good enough?) I use scikit-learn, it’s a great package. Nevertheless, I had to add a lot of custom functions (tens of functions) in order to get good performance (and good behaviour with pandas). Some examples include a GridSearchCVSubset that uses a % of the data for crossvalidation to a RandomForest that does not use all the validation data to compute the OOB scores.

2 Likes

FWIW, I really like the direction MLJ is taking, especially embracing the callable struct syntax in Julia with its new syntax, see 46:32 in this talk. I think MLJ will give Julia users a great playground to try different ML models or ensembles of them and different fine-tuning algorithms using a few lines of code and without having to learn the syntax of every ML package out there.

I think at some point, MLJ can also be used to recommend some algorithms for your data which if taken a step further can be a great tool to do research in automated machine learning (AutoML). Of course, just like with any open source package, its progress can be accelerated by contributing code, docs or feedback.

Kudos to @tlienart and all the other contributors for their great work in MLJ so far!

4 Likes

My issues aren’t with confederated code bases, I’m all fine and good with that. That structure is highly amenable to Julia in general and gives some breathing room. I am not convinced we can get all the advantages we might want going this route 100%, but that’s a debatable nuance not worth poking…

I’m not here to rain on MLJ. I misdirected my troubles on MLJ and that wasn’t fair. I’m sorry.

But as promised I am writing a list up of my gripes with sklearn, please feel free to shoot down the grammar, logical errors, and formatting, I don’t have a lot of time. Here’s some for starters:

  • Crossvalidation quickly divulges into memorizing an API and basically accessing tons of weird custom kwargs in dicts. Nested cross val’s are sloppy, and unintuitive. Leading lay people to avoid them and improperly validate model parameters(I see it all the time). This could easily be improved in julia, I favor the iterator approach, but hell, do as you please.

My worries with MLJ is what we see is lots of macro calls that lock code/model building down into inflexible nongeneric paradigms. Sure it’s snappy, but for people who know what they want, they look at something like that and end up trying to hack around it, or just going rogue after realizing an API won’t satisfy their needs without serious effort…

On the flipside macros could be used for model inspection! You ever leave work on Friday to train a model over the weekend, only to have it explode 8hrs in? What if you could evaluate a chunk of code which didn’t do any, or minimal parts of the math, just sort out indexing bugs and things. Trivial use case, but there are times where this sort of thing could be helpful, and AFAIK can’t be done in python.

  • Unification of tools… I could go on for days as to how they implemented things and how it defies the elegance of the obvious pieces of theory in the field. Here’s a small example: sklearn.cluster.MiniBatchKMeans,sklearn.cluster.KMeans, well why not SGD KMeans? What is different here is one is an online learning algorithm. The notions between offline and online algorithms could be cleanly broken out and code reuse could happen(not neccessairily here but in other cases). Well, say we want to do PLS regression, performing online PLS involves a subset of the PLS calculation. Some algorithms for other models surely are similar. But why make a bunch of separate methods for it? Worse, what if they were in different packages, one had an error, the other didn’t? Confusing for an end user if maintenance debacles happen.

I admit, this is somewhat OCD. Who cares how sloppy a code base is, or how far it sprawls unless you’re trying to sort out a bug, or vet it for industrial/certified usage(some companies won’t use packages unless they have been internally vetted)? For people just hacking away it’s no big deal if everything goes as planned. But julia code can be beautiful, and it can elegantly link theory to practice. Python can’t really have that, we really need to tap into that in my opinion…

A lot of models are simple compositions of transformations and other models. What you see often times in SKlearn is these aren’t handled in that way whatsoever. Many models could be represented beautifully as a DAG, and treated that way internally. This is valuable for a lot of industries/workflows, whether they know it now or not… Many of those methods aren’t even in SKLearn last I checked - probably for this reason…

  • 2 or N language problem: scikit learn is frequently calling down to C/C++ to perform operations. Who cares? I do, I care that I have to track down a cpp file, read the code, pull open a C++ editor
    and debug machine learning tweaks in C++, its laborious and error prone because I don’t write C++ for a living. I also care because you end up with multiple functions doing effectively the same things because there isn’t continuity in the codebase. I don’t have a specific example because I haven’t grocked SKLearns code in a while because… have you looked at it it’s 10s of thousands of lines?

Leveraging sklearn in Julia sure, it’s nice, but it’s a bandaid. Besides, so many of these algorithms when written in julia are differentiable. Meaning I can hack away in flux, to add penalties to base methods. I had a bayes method running at my last job doing exactly this. Why? because it worked well, was a super simple tweak that improved performance, but - I had to write the algorithm entirely in Julia to get that benefit or manually write backprop rules into someone elses code(not gonna happen).

  • Parallelism - SKLearn has some methods that can leverage multicore. Others that cannot. Sucks plain and simple. Julia can embed parallelism without doing shifty things! We gotta showcase this.

  • Modularity - The SKLearn codebase is a pile of python glue. It’s doing too many things. The data ops should be separate from the modelling things, the tuning ops should be separate as well. Especially now that tuning is widely considered a model. Yes that’s taste, but, by enforcing this, typically you get better separations of concerns and more generic code.

So yea that’s some off the cuff rambling about superficial issues I have with SKLearn. I think anyone who has used it for anything that wasn’t a kaggle tutorial has had to hack away against it’s internals it to get something done.

3 Likes