ChemometricsTools.jl a machine learning library for chemists(or really anyone)

I am sharing this a little early, but I received some good feedback from a first vocal visitor. I’m eager to hear some more before I buff this package a bit more. So far it’s a one man show building this package from thin air in his free-time - so please be kind.

ChemometricsTools.jl

This package contains a collection of tools to perform fundamental and advanced Chemometric analysis’ in Julia. It is currently richer and more fundamental than any single free chemometrics package available in any other language. If you are uninformed as to what Chemometrics is; it could nonelegantly be described as the marriage between data science and chemistry. Traditionally it is a pile of applied linear algebra/statistics that is well reasoned by the physics and meaning of chemical measurements. This is somewhat orthogonal to most fields of machine learning (aka “add more layers”). Sometimes chemometricians also get desperate and break out pure machine learning methods. So some of those methods are in this package, but if you want neural networks try Flux.jl.

Tutorials/Demonstrations:

Shootouts/Modeling Examples:

Package Status => “Registrator” release! (v 0.4.6)

ChemometricsTools is pretty new, and was recently accepted to be registered as an official Julia package! Yep, so you can Pkg.add("ChemometricsTools") to install it. The git repo’s master branch has the most stable version right now, I fixed a lot of bugs since 0.2.3. In 0.4.6 almost all of the functionality available can reliably be used/abused, and the documentation is getting there, but it’s hard to keep up with all the features I’ve been adding. There’s probably still a few bugs. Some interesting plans for v0.5.0, but I’ve personally been testing this package doing some work with a fellow analytical chemist.

Package Highlights

Transforms/Pipelines

Two design choices introduced in this package are “Transformations” and “Pipelines”. We can use transformations to treat data from multiple sources the same way. This helps mitigate user error for cases where test data is scaled based on training data, calibration transfer, etc.

Multiple transformations can easily be chained together and stored using “Pipelines”. Pipelines aren’t “pipes” like are present in Bash, R and base Julia. They are flexible, yet immutable, convenience objects that allow for sequential preprocessing and data transformations to be reused, chained, or automated for reliable analytic throughput.

Model training

ChemometricsTools offers easy to use iterators for K-folds validation’s, and moving window sampling/training. More advanced sampling methods, like Kennard Stone, are just a function call away. Convenience functions for interval selections, weighting regression ensembles, etc are also available. These allow for ensemble models like SIPLS, P-DS, P-OSC, etc to be built quickly. With the tools included both in this package and Base Julia, nothing should stand in your way.

Regression Modeling

This package features dozens of regression performance metrics, and a few built in plots (Bland Altman, QQ, Interval Overlays etc) are included. The list of regression methods currently includes: CLS, Ridge, Kernel Ridge, LS-SVM, PCR, PLS(1/2), ELM’s, Regression Trees, Random Forest… More to come. Chemometricians love regressions!

Classification Modeling

In-house classification encodings (one cold/one hot), and easy to retrieve global or multiclass performance statistics. ChemometricsTools currently includes: LDA/PCA with Gaussian discriminants, also Hierchical LDA, multinomial softmax/logistic regression, PLS-DA, K-NN, Gaussian Naive Bayes, Classification Trees, Random Forest, Probabilistic Neural Networks, LinearPerceptrons, and more to come.

Specialized tools?

This package has tools for specialized fields of analysis’. For instance, fractional derivatives for the electrochemists (and the adventurous), a handful of smoothing methods for spectroscopists, curve resolution for forensics, process fault detection methods, etc. There are certainly plans for other tools for analyzing chemical data that packages in other languages have seemingly left out. Stay tuned.

Where’s the Data?

Right now I don’t have rights to provide much data; but the iris, Tecator meat data, and a NASA fault detection datasets are included. I’d love for a collaborator to contribute some: spectra, chromatograms, etc. Please reach out to me if you wish to collaborate/contribute. There’s a good chance in a week or so I’ll be reaching out to the community for these sorts of things, in the mean time you can load in your own datasets using the Julia ecosystem.

What about Time Series? Cluster models?

Well, I’d love to hammer in some time series methods. That was originally part of the plan. Then I realized OnlineStats.jl already has pretty much everything covered. Similarly, if you want clustering methods, just install Clustering.jl. I may add a few supportive odds and ends in here(or contribute to the packages directly) but really, most of the Julia 1.0+ ecosystem is really reliable, well made, and community supported.

17 Likes

Not sure if this is the right place for this - but I think I’d like to reach out for a little help. Life has hit me pretty hard lately and I don’t have as much time as I’d like to work on this package. But I noticed a problem, it takes a long time to build the package! I think it’s because of the doc’s - does anyone wanna lend a hand and grok how I’ve done my CI/CD and documentation and see if there’s a better way? I’d really appreciate it!

@anon92994695 Hi there! I’m actually a doctoral student who was recently introduced to Julia. There’s a chance I might be implementing some chemometric methods in my thesis. If so, I’d be happy to help build out this package with you.

2 Likes

Awesome Taylor. Feel free to raise issues, and we can discuss solutions to any problems you have. I’d be happy to have a colleague or 10 on that package it’s a lot of work. If you’d like I can point to obvious things to improve/test - my free time is just super limited and I am making a new julia package. Please feel free to message me on here or on github!

Great Casey, thanks for the quick response! Will let you know when I get the official go-ahead to build this guy out. Will follow up with you via github per your suggestion

1 Like

If you don’t get the official go ahead please feel free to make suggestions for things to incorporate, or if you see any bugs! I get more motivation when people tell me what is useful to them.

1 Like

I just cut a new release - v 0.5.1 which removes Plots.jl as a dependency and uses RecipesBase. Now the package compiles MUCH more quickly! I would like to thank @tamasgal for their advice in doing this.

2 Likes