Is there any consolidated set of machine learning packages available for Julia? I have spent some time trying to search for one, but have only come across JuliaML and it has not exactly been updated regularly. I would like to implement similar API’s as MLPack in Julia http://www.mlpack.org/about.html. But for that, I would need to know what already exists and what doesnt, there must be some sort of framework for bringing all of this under one organized roof right?
But seriously, JuliaML at the moment is a collection of low-level packages for working on ML algorithms (LossFunctions, PenaltyFunctions, LearningStrategies, MLLabelUtils, etc.). There isn’t a high level interface to models (at least yet). A package like LossFunctions isn’t very active because it has a fairly set api for working with 20+ loss functions. It should only see activity when someone finds a new loss function to work with.
For what it’s worth, I’ve been working with LossFunctions, PenaltyFunctions, and LearningStrategies on SparseRegression with great success.
I am also a bit lost on the current state of affairs.
I think there are currently 3 interface packages for machine learning / modeling interfaces: StatsBase.jl, MLBase.jl, and ScikitLearnBase.jl and the functions defined in those are (not surprisingly) partly overlapping (especially fit!() and predict()).
They differ in their focus but seem to have enough overlap that it seems like a fragmentation (akin to the one in data ecosystem) with different algorithm packages extending different one(s) of the above. I would appreciate if someone familiar with it would give a brief overview about the current state and (if possible) where it is heading.
Will there be a common main interface soon that each new algorithm should implement to take advantage of (to be developed) general purpose packages for feature selection, hyperparameter tuning etc?
ScikitLearn.jl author here. I started that project in part because it felt like StatsBase’s interface wasn’t consistent enough (across packages) to be usable at the time (eg. there were some very long github issues over whether the input matrices to fit! and transform ought to be N_sample X N_feature or N_feature X N_sample). Even today, the description of fit! is Fit a statistical model in-place., which leaves too much unspecified IMO.
ScikitLearn.jl is pretty… I don’t want to say “mature”, but maybe, “complete”? It does what it says on the box. Its interface has been implemented in a few Julia packages, and anyone is welcome to add their package to the list
TBH, I haven’t followed JuliaML. We talked last year about making its interface compatible with ScikitLearn.jl. Nothing has come out of it yet, but I’d be happy to consider a proposal.
Frankly I still use ScikitLearn.jl more often than not. I find it to be quite reliable. It would be nice to see more pure Julia machine learning though. It actually seems like a relatively fun thing to work on, but unfortunately I don’t really have the time.
JuliaML was essentially created out of a collection of people experimenting with learning algorithms. As such, it’s mostly a set of low-level tools which aid in writing algorithms.
You can build some really cool stuff very quickly with the primitives in JuliaML, but someone has to build it. As a proof of concept, here’s a quick writeup of LossFunctions/PenaltyFunctions:
If anyone is interested in starting to implement some more methods in Julia, “Elements of Statistical Learning” is the essential reference and the PDF is freely available.
I wonder if there are any changes here, especially regarding plans for cooperation / unification?
From my perspective, ScikitLearn.jl + ScikitLearnBase.jl are very nice and it would be good if more pure-Julia models supported it. Now there are a lot of pure-Julia algorithms implemented in JuliaStats org but they all support the StatsBase abstractions and since Julia doesn’t support multiple inheritance and the method names clash, it’s not clear to me how to add support for ScikitLearnBase.jl interface.
Perhaps it would be a good idea to unify/extend the modelling abstractions defined in StatsBase.jl and ScikitLearnBase.jl to cover the needs of both JuliaStats as well as ScikitLearn.jl (and perhaps move them to StatsModels.jl)? This way everyone would extend the same abstractions and the Julia ecosystem would be stronger - we all know how popular ScikitLearn is in Python. Or perhaps there is some better solution?
PS: I hope you don’t mind my “necroposting” - seemed appropriate in this case.
All this is so much easier in Julia. I wouldn’t be surprised if Julia Scikitlearn port managed to support some of these before the Python original. Provided that it wants to and starts to attract some contributors, of course.