Machine Learning Toolset Improvement

Is there any consolidated set of machine learning packages available for Julia? I have spent some time trying to search for one, but have only come across JuliaML and it has not exactly been updated regularly. I would like to implement similar API’s as MLPack in Julia http://www.mlpack.org/about.html. But for that, I would need to know what already exists and what doesnt, there must be some sort of framework for bringing all of this under one organized roof right?

It would be great if someone can help me.

@Evizero’s GitHub activity begs to differ.

But seriously, JuliaML at the moment is a collection of low-level packages for working on ML algorithms (LossFunctions, PenaltyFunctions, LearningStrategies, MLLabelUtils, etc.). There isn’t a high level interface to models (at least yet). A package like LossFunctions isn’t very active because it has a fairly set api for working with 20+ loss functions. It should only see activity when someone finds a new loss function to work with.

For what it’s worth, I’ve been working with LossFunctions, PenaltyFunctions, and LearningStrategies on SparseRegression with great success.

You may also be interested in GitHub - FluxML/Flux.jl: Relax! Flux is the ML library that doesn't make you tensor, GitHub - malmaud/TensorFlow.jl: A Julia wrapper for TensorFlow, and GitHub - cstjean/ScikitLearn.jl: Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/.

I am also a bit lost on the current state of affairs.

I think there are currently 3 interface packages for machine learning / modeling interfaces: StatsBase.jl, MLBase.jl, and ScikitLearnBase.jl and the functions defined in those are (not surprisingly) partly overlapping (especially fit!() and predict()).

They differ in their focus but seem to have enough overlap that it seems like a fragmentation (akin to the one in data ecosystem) with different algorithm packages extending different one(s) of the above. I would appreciate if someone familiar with it would give a brief overview about the current state and (if possible) where it is heading.

Will there be a common main interface soon that each new algorithm should implement to take advantage of (to be developed) general purpose packages for feature selection, hyperparameter tuning etc?

ScikitLearn.jl author here. I started that project in part because it felt like StatsBase’s interface wasn’t consistent enough (across packages) to be usable at the time (eg. there were some very long github issues over whether the input matrices to fit! and transform ought to be N_sample X N_feature or N_feature X N_sample). Even today, the description of fit! is Fit a statistical model in-place., which leaves too much unspecified IMO.

ScikitLearn.jl is pretty… I don’t want to say “mature”, but maybe, “complete”? It does what it says on the box. Its interface has been implemented in a few Julia packages, and anyone is welcome to add their package to the list

TBH, I haven’t followed JuliaML. We talked last year about making its interface compatible with ScikitLearn.jl. Nothing has come out of it yet, but I’d be happy to consider a proposal.

2 Likes

Frankly I still use ScikitLearn.jl more often than not. I find it to be quite reliable. It would be nice to see more pure Julia machine learning though. It actually seems like a relatively fun thing to work on, but unfortunately I don’t really have the time.

2 Likes

JuliaML was essentially created out of a collection of people experimenting with learning algorithms. As such, it’s mostly a set of low-level tools which aid in writing algorithms.

You can build some really cool stuff very quickly with the primitives in JuliaML, but someone has to build it. As a proof of concept, here’s a quick writeup of LossFunctions/PenaltyFunctions:

https://joshday.github.io/2017/07/13/JuliaML.html

Also, OnlineStats can run a variety of online learning algorithms (SGD, AdaGrad, Adam, etc.) with just about any combination of loss/penalty:

http://joshday.github.io/OnlineStats.jl/latest/pages/api.html#OnlineStats.StatLearn

If anyone is interested in starting to implement some more methods in Julia, “Elements of Statistical Learning” is the essential reference and the PDF is freely available.

3 Likes

Hey

I wonder if there are any changes here, especially regarding plans for cooperation / unification?

From my perspective, ScikitLearn.jl + ScikitLearnBase.jl are very nice and it would be good if more pure-Julia models supported it. Now there are a lot of pure-Julia algorithms implemented in JuliaStats org but they all support the StatsBase abstractions and since Julia doesn’t support multiple inheritance and the method names clash, it’s not clear to me how to add support for ScikitLearnBase.jl interface.

Perhaps it would be a good idea to unify/extend the modelling abstractions defined in StatsBase.jl and ScikitLearnBase.jl to cover the needs of both JuliaStats as well as ScikitLearn.jl (and perhaps move them to StatsModels.jl)? This way everyone would extend the same abstractions and the Julia ecosystem would be stronger - we all know how popular ScikitLearn is in Python. Or perhaps there is some better solution?

PS: I hope you don’t mind my “necroposting” - seemed appropriate in this case.

4 Likes

I think you raise an important point. Maybe traits or composition can help here.

Here’s the scikitlearn roadmap: Roadmap — scikit-learn 1.2.dev0 documentation

Some interesting stuff added

All this is so much easier in Julia. I wouldn’t be surprised if Julia Scikitlearn port managed to support some of these before the Python original. Provided that it wants to and starts to attract some contributors, of course. :slight_smile:

1 Like