Hello everyone and a happy new year!
I am happy to announce a new JuliaML package for working with classification targets in various formats. It is not yet registered (so don’t believe the documentation!), but I will create a PR to METADATA soon.
Github: https://github.com/JuliaML/MLLabelUtils.jl
Description
MLLabelUtils.jl was designed with package developers in mind that require their classification-targets to be in a specific format. To that end, the core focus of this package is to provide all the tools needed to deal with classification targets of arbitrary format. This includes asserting if the targets are of a desired encoding, inferring the concrete encoding the targets are in and how many classes they represent, and converting from their native encoding to the desired one.
The key problem that it addresses is that many classification models require the targets in a specific encoding. For example a textbook implementation for logistic regression requires that targets to be in the set {1,0}. In contrast to this, neural networks typically work on targets that are one-hot encoded. Then there are margin-based classifier, such as SVMs that expect the targets to be in {-1,1}. Add to all this that the targets one usually starts out with are strings.
Check it out and let me know what you think! The documentation is very thorough for such as small package.
Documentation: MLLabelUtils.jl’s documentation — MLLabelUtils.jl 0.0.1 documentation
Context
The main motivation behind creating this package is to be a light-weight back-end for other in-development JuliaML packages (LossFunctions.jl, MLMetrics.jl, and MLDataUtils.jl) that deal with classification problems in one way or another. As such it is well tested and a lot of emphasis was put into type-stabilty and flexibility. I also put a lot of effort into making it user-friendly.
Converting targets between encodings is admittedly not the most exciting task in machine learning, but I think it is nevertheless a core task that needs to be handled properly for Julia to gain momentum in the applied ML domain. I see it as a rather important aspect of JuliaML to provide quality tools and documentation for solving these basic task.
Closing Words
Let me know what you think. Any kind of feedback or criticism is very welcome!
PS: We even support a lazy on-the-fly conversion using MappedArrays