Machine Learning Toolset Improvement


#1

Is there any consolidated set of machine learning packages available for Julia? I have spent some time trying to search for one, but have only come across JuliaML and it has not exactly been updated regularly. I would like to implement similar API’s as MLPack in Julia http://www.mlpack.org/about.html. But for that, I would need to know what already exists and what doesnt, there must be some sort of framework for bringing all of this under one organized roof right?

It would be great if someone can help me.


#2

@Evizero’s GitHub activity begs to differ.

But seriously, JuliaML at the moment is a collection of low-level packages for working on ML algorithms (LossFunctions, PenaltyFunctions, LearningStrategies, MLLabelUtils, etc.). There isn’t a high level interface to models (at least yet). A package like LossFunctions isn’t very active because it has a fairly set api for working with 20+ loss functions. It should only see activity when someone finds a new loss function to work with.

For what it’s worth, I’ve been working with LossFunctions, PenaltyFunctions, and LearningStrategies on SparseRegression with great success.

You may also be interested in https://github.com/MikeInnes/Flux.jl, https://github.com/malmaud/TensorFlow.jl, and https://github.com/cstjean/ScikitLearn.jl.


#3

I am also a bit lost on the current state of affairs.

I think there are currently 3 interface packages for machine learning / modeling interfaces: StatsBase.jl, MLBase.jl, and ScikitLearnBase.jl and the functions defined in those are (not surprisingly) partly overlapping (especially fit!() and predict()).

They differ in their focus but seem to have enough overlap that it seems like a fragmentation (akin to the one in data ecosystem) with different algorithm packages extending different one(s) of the above. I would appreciate if someone familiar with it would give a brief overview about the current state and (if possible) where it is heading.

Will there be a common main interface soon that each new algorithm should implement to take advantage of (to be developed) general purpose packages for feature selection, hyperparameter tuning etc?


#4

ScikitLearn.jl author here. I started that project in part because it felt like StatsBase’s interface wasn’t consistent enough (across packages) to be usable at the time (eg. there were some very long github issues over whether the input matrices to fit! and transform ought to be N_sample X N_feature or N_feature X N_sample). Even today, the description of fit! is Fit a statistical model in-place., which leaves too much unspecified IMO.

ScikitLearn.jl is pretty… I don’t want to say “mature”, but maybe, “complete”? It does what it says on the box. Its interface has been implemented in a few Julia packages, and anyone is welcome to add their package to the list

TBH, I haven’t followed JuliaML. We talked last year about making its interface compatible with ScikitLearn.jl. Nothing has come out of it yet, but I’d be happy to consider a proposal.


#5

Frankly I still use ScikitLearn.jl more often than not. I find it to be quite reliable. It would be nice to see more pure Julia machine learning though. It actually seems like a relatively fun thing to work on, but unfortunately I don’t really have the time.


#6

JuliaML was essentially created out of a collection of people experimenting with learning algorithms. As such, it’s mostly a set of low-level tools which aid in writing algorithms.

You can build some really cool stuff very quickly with the primitives in JuliaML, but someone has to build it. As a proof of concept, here’s a quick writeup of LossFunctions/PenaltyFunctions:

https://joshday.github.io/2017/07/13/JuliaML.html

Also, OnlineStats can run a variety of online learning algorithms (SGD, AdaGrad, Adam, etc.) with just about any combination of loss/penalty:

http://joshday.github.io/OnlineStats.jl/latest/pages/api.html#OnlineStats.StatLearn


#7

If anyone is interested in starting to implement some more methods in Julia, “Elements of Statistical Learning” is the essential reference and the PDF is freely available.