[ANN] PartialLeastSquaresRegressor.jl

Hello everyone.

This is an open invitation to the community. PartialLeastSquaresRegressor.jl is a package made in Julia with the aim of solving regression problems, especially when there are few samples. Very suitable for domains such as biomedical, healthcare, chemistry or even other domains.

The package now has a new interface and is registered with MLJ. As soon as possible it will be possible to use it by calling via @load from MLJ.

If you want to help contribute to the package, welcome!

Please visit the repo at PartialLeastSquaresRegressor.jl

Below, a little bit of development history for those interested:

Three years ago I started the development of a package on an algorithm called Partial Least Squares Regressor very efficient for problems in which we have few samples. This was the first implementation, I believe, of this regressor in the Julia language. At the time, if I remember correctly, there was no implementation even in Scikit-learn and in R packages. During development, I implemented three versions of the algorithm: PLS1, a linear regressor with a single target; PLS2, a multitarget linear regressor; and finally, Kernel PLS, a multitarget regressor for non-linear problems. All of these in one package. Over time some colleagues helped us, mainly, @filipebraida.

Recently, the MLJ team (@ablaom , @tlienart ), and with help of @azev77, found us and helped us a lot so that we could put an interface for MLJ. Ah, we also changed the name of the package to obtain greater conciseness and clarity.

Thank you all!

14 Likes

Nice, I had a PLS regression in ChemometricsTools maybe a year ago.

If you want you can copy paste my multiway PLS code(https://github.com/caseykneale/ChemometricsTools.jl/blob/d8cd288ae76b221274a54cc204cd146791bddf98/src/MultiWay.jl#L228) into the package for people doing tensor ordered regressions.

5 Likes

Can I suggest you to make an improvement for forming your predictions?

Follow the approach given: Martens H., NĂŠs T. Multivariate Calibration. Wiley: New York, 1989.
as shown here
https://github.com/caseykneale/ChemometricsTools.jl/blob/d8cd288ae76b221274a54cc204cd146791bddf98/src/RegressionModels.jl#L212

This turns inference into a single matmul, because PLS does truly follow Y = XB when center scaled. Should work for PLS1 and PLS2.

Also to be complete you could add SIMPLS. Some users favor this algorithm for performance.

3 Likes

Also you might want to add diagonistics like Leverage, Explained variance in X&Y, and Q& Hotelling statistics as in here: https://github.com/caseykneale/ChemometricsTools.jl/blob/master/src/ModelAnalysis.jl#L21. This goes in line with forming a regression coefficient vector, many people will inspect their regression coefficients visually or mathematically. So I really recommend the approach in the previous post.

PLS is close to useless without those kinds of diagnostics…

2 Likes

Good suggestions @anon92994695 !!!

However, I need help from people like you, because I am in various tasks. Want to be a collaborator?

4 Likes

I am kind of involved in a lot of stuff too, but I could contribute most of that stuff for you pretty easily. Maybe sometime this week I could get too it if that’s OK?

I don’t know how or if the MLJ hooks can handle billinear things but I can at least add them to the base repo.

Wow, that’s great!

I will add you to the project.

2 Likes

Cool, good job!

Also, a slight correction, but R in fact has had pls since 2007. See the pls R package (still maintained with the last release in August 2020) at PLS: the R package and the initial paper The pls Package: Principal Component and Partial Least Squares Regression in R | Journal of Statistical Software.

4 Likes

What a mistake I made about R. Thanks.

I wouldn’t worry about it, I feel like you could say R already did it for tons of stats things like in that Southpark episode where the they utter the phrase the Simpsons already did it!

3 Likes

Hi. I Will create some issues in the repo. Some of the enhancements are: a) perform benchmarks for each algorithm and find ways to improve execution time b) implement Simpls c) automatic way of finding the factors using variance d) a nice documentation … e) be the best and fastest PLS package regarding other languages !!! :slight_smile:

2 Likes

@anon92994695

Sent you an invite. I created some issues there regarding what you suggested.

1 Like

Some interesting issues after a talk.

https://github.com/lalvim/PartialLeastSquaresRegressor.jl/issues

1 Like

Cool! Any reason you didn’t call it simply PartialLeastSquares.jl? That would save some typing and I don’t think there’s a risk of confusion with other methods.

1 Like

There’s actually a few other methods, which are not the same as PLSR called Partial Least Squares. Partial least squares - discriminant analysis (PLSDA), and partial least squares structural equation modelling(PLS-SEM). So the name is fine. Abbreviating to something like “PLSRegression.jl” could work.

1 Like

Actually, this is the old name.

There was an issue about this.

Sumarizing, there were some patterns regarding mlj interface of the methods that I decided to follow. Module name and function were with same name.

Nice! I did a manual and quite bad implementation of NIPALS algorithm a little while ago for a assignment in Linear Models. Definitively not as elegant as your code.

However, one thing I would love is if there’s an argument in order to return the residue matrices.

1 Like

For information, in R there is also

with many PLS algorithms and other chemometrics things.

You may also have a look at (current package in development)

for Julia.