I would like to take this opportunity to ask the community about the future of ML in Julia, and how the existing efforts interact. I am particularly interested in understanding to what extent the JuliaML organization and the MLJ.jl project are connected. I appreciate if the leaders of these projects can share their views about their roadmaps, and if it would be possible to join human resources towards a unified solution.
Assuming you’re talking about “classical ML” / “data analysis” (so not specifically DL/RL) I’d say the basic bricks for tabular data analysis are:
- data wrangling (imputation, encoding, dimensionality reduction, …)
- basic models (KNN, logreg, DT, svm, …)
- model composability (pipeline, stacking, ensembling)
- tuning and evaluation (hyperparam tuning, comparing metrics/models/…)
All these are represented in the ecosystem (sometimes multiple times) though don’t necessarily talk to each other very well / have different goals.
Afaik MLJ focuses on 3-4 hoping that it will spur development, compatibility and maintainability of 1-2. Indeed, assuming it becomes an entry point for users through which they can call many of the more specific packages, it could mean more input for the package devs etc. My hope is also that it would help identify/highlight missing/inefficient functionalities.
As per interaction between MLJ and JuliaML, I’m sure other people can give a better insight than mine but I think a few of the packages in JuliaML offer metrics/encoding which have been re-defined in MLJ as it was needed for the way MLJ deals with types, metadata etc. I would think it’s a similar story for MLDataPattern.
Other packages from JuliaML which are more related to optimisation / learning strategies are not directly relevant to MLJ (though I may be wrong here).
There is LossFunctions though which should be interfaced with soon (as you know I believe) there may also be scope for interacting with ValueHistories, MLPreprocessing, and maybe copy a few things from MLPlots. Beyond that, I’m not sure.
Thank you @tlienart for sharing your views. I believe that the items you listed 1-4 make a lot of sense. I also think that packages like LossFunctions.jl are quite useful to have, or be incorporated in the MLJ.jl project.
Would it make sense to migrate MLJ.jl to JuliaML? Or maybe find a way to share with the community that the development directions are heading towards MLJ.jl and that other packages in JuliaML are deprecated nowadays?
Ideally, members from JuliaML and alan-turing-institute would join in a single organization to speed up the development of “classic” ML in Julia. However, I am just sharing my opinion on the subject to see if there is interest from both parts.
Thanks for the work! I am starting to depend on MLJBase.jl on my own packages, and I am enjoying the project more and more.
I think this is currently unlikely given the project is supported by the ATI; also I don’t believe it’s the ambition of MLJ to define how things should be done in ML in Julia but rather offer incentives to have people interface with it so that more functionalities become easily available and composable to a “non-dev” end-user.
That being said, I agree with some part of your second point: with time and users (and benchmarks), MLJ should help identify weaknesses/missing features in the ecosystem which can be fed back to the community.
I think defining how ML is done in Julia is pretty much what you said. An API, an incentive, whatever we call it. As soon as the community converges on evolved best practices, we are all set.
I have only written a couple of lines in Julia, so I am not an authority versed in this, therefore what I say here is only a superfluous opinion on this.
To unify both projects, if they were interested, it would be very good to organize a hackathon in some JulianCon, thus architecture is defined and code is given from the entire expert community, there is no better opportunity to work than these conferences.