ANN: MLDataPattern.jl

Evizero · April 20, 2017, 12:12pm

Hello everyone!

I am really happy to finally announce the next JuliaML package reaching a stable state: MLDataPattern.jl

Github: https://github.com/JuliaML/MLDataPattern.jl

and with it the long overdue update of MLDataUtils.jl, which now uses MLDataPattern as one of its back-ends; thus serving as a meta package. It took us months to finally get here. The last tag of MLDataUtils was right around the 0.5 release. Since then we have completely redesigned the data munging functionality and just recently - because of code complexity - outsourced them into their own package MLDataPattern. With this change, the original package MLDataUtils will now serve as a convenient end-user facing package that reexports all data related functionality of JuliaML

Github: https://github.com/JuliaML/MLDataUtils.jl

Description

MLDataPattern is a long running effort from a few of us to design and implement a package for common ML data access pattern in a Julian manner. As such you may find it a bit unintuitive at first if you are used to other frameworks from other languages. Yet we think the benefits are worth it. Most notably the package provides a number of pattern for lazy shuffling, partitioning, and resampling data sets of various types and origin. At its core, the package was designed around the key requirement of allowing any user-defined type to serve as a custom data source and/or access pattern in a first class manner. We tried to accomplish this by designing the package to be as data container agnostic as we could.

Check it out! The documentation is very comprehensive.

Documentation: MLDataPattern.jl’s documentation — MLDataPattern.jl 0.1 documentation

Closing Words

Let me know what you think. Any kind of feedback or criticism is very welcome!

Big thanks to @tbreloff @oxinabox for design and code contributions to the data access pattern!

mkborregaard · April 20, 2017, 12:50pm

This looks really useful. From looking at it, it looks as if much of the functionality could extend way beyond machine learning, as a general data handling functionality. Perhaps by defining this on new user types?
Yet you write that it explicitly belongs under the framework of machine-learning - can you expand a bit on why that is? Useful before I consider applying to another purpose.
Thanks!

Evizero · April 20, 2017, 1:07pm

Well, really MLDataPattern is about nesting data sub-setting operations. This is a common theme in machine learning, which is one of the very few interesting areas that I know something about. I emphasize in the documentation that it is “machine learning specific” to make clear, that MLDataPattern package has nothing to do with any select/groupby/summarize kind of operation. It doesn’t care about the data itself, or even what it represents (other than potential prediction targets). It only cares about sub-setting.

Does that answer your question? If there is some specific use case you are uncertain of, please also feel free to message me personally on gitter.

mkborregaard · April 20, 2017, 1:08pm

Thanks, I’ll catch you on gitter

mauro3 · April 20, 2017, 1:37pm

(I can’t find a link to the gitter room. Where is it? Maybe worth linking to it too.)

Evizero · April 20, 2017, 1:42pm

Gitter: JuliaML/chat - Gitter

We do have a section on “Getting help” where it is listed. Maybe I should add it to the readme as well

tim.holy · April 21, 2017, 1:32pm

This looks like an elegant approach to easing some very common operations. Nice work!

Topic		Replies	Views
[ANN] ReusePatterns.jl Package Announcements	1	898	October 24, 2020
JuliaML organization and MLJ.jl Machine Learning	5	1469	August 19, 2019
MLStyle.jl, a package to supply algebraic data types and all kinds of pattern matching Community package	7	3119	August 18, 2018
ANN: MLLabelUtils.jl Machine Learning package , announcement	0	988	January 1, 2017
[ANN] LearnAPI.jl - Proposal for a basement-level machine learning API Package Announcements	112	6429	February 19, 2025

ANN: MLDataPattern.jl

Description

Closing Words

Related topics