[ANN] PartialLeastSquaresRegressor.jl

Hello everyone.

This is an open invitation to the community. PartialLeastSquaresRegressor.jl is a package made in Julia with the aim of solving regression problems, especially when there are few samples. Very suitable for domains such as biomedical, healthcare, chemistry or even other domains.

The package now has a new interface and is registered with MLJ. As soon as possible it will be possible to use it by calling via @load from MLJ.

If you want to help contribute to the package, welcome!

Please visit the repo at PartialLeastSquaresRegressor.jl

Below, a little bit of development history for those interested:

Three years ago I started the development of a package on an algorithm called Partial Least Squares Regressor very efficient for problems in which we have few samples. This was the first implementation, I believe, of this regressor in the Julia language. At the time, if I remember correctly, there was no implementation even in Scikit-learn and in R packages. During development, I implemented three versions of the algorithm: PLS1, a linear regressor with a single target; PLS2, a multitarget linear regressor; and finally, Kernel PLS, a multitarget regressor for non-linear problems. All of these in one package. Over time some colleagues helped us, mainly, @filipebraida.

Recently, the MLJ team (@ablaom , @tlienart ), and with help of @azev77, found us and helped us a lot so that we could put an interface for MLJ. Ah, we also changed the name of the package to obtain greater conciseness and clarity.

Thank you all!

12 Likes

Nice, I had a PLS regression in ChemometricsTools maybe a year ago.

If you want you can copy paste my multiway PLS code(https://github.com/caseykneale/ChemometricsTools.jl/blob/d8cd288ae76b221274a54cc204cd146791bddf98/src/MultiWay.jl#L228) into the package for people doing tensor ordered regressions.

4 Likes

Can I suggest you to make an improvement for forming your predictions?

Follow the approach given: Martens H., NĂŠs T. Multivariate Calibration. Wiley: New York, 1989.
as shown here

This turns inference into a single matmul, because PLS does truly follow Y = XB when center scaled. Should work for PLS1 and PLS2.

Also to be complete you could add SIMPLS. Some users favor this algorithm for performance.

1 Like

Also you might want to add diagonistics like Leverage, Explained variance in X&Y, and Q& Hotelling statistics as in here: https://github.com/caseykneale/ChemometricsTools.jl/blob/master/src/ModelAnalysis.jl#L21. This goes in line with forming a regression coefficient vector, many people will inspect their regression coefficients visually or mathematically. So I really recommend the approach in the previous post.

PLS is close to useless without those kinds of diagnostics…

1 Like

Good suggestions @ckneale !!!

However, I need help from people like you, because I am in various tasks. Want to be a collaborator?

3 Likes

I am kind of involved in a lot of stuff too, but I could contribute most of that stuff for you pretty easily. Maybe sometime this week I could get too it if that’s OK?

I don’t know how or if the MLJ hooks can handle billinear things but I can at least add them to the base repo.

Wow, that’s great!

I will add you to the project.

2 Likes

Cool, good job!

Also, a slight correction, but R in fact has had pls since 2007. See the pls R package (still maintained with the last release in August 2020) at https://mevik.net/work/software/pls.html and the initial paper https://www.jstatsoft.org/article/view/v018i02.

3 Likes

What a mistake I made about R. Thanks.

I wouldn’t worry about it, I feel like you could say R already did it for tons of stats things like in that Southpark episode where the they utter the phrase the Simpsons already did it!

3 Likes

Hi. I Will create some issues in the repo. Some of the enhancements are: a) perform benchmarks for each algorithm and find ways to improve execution time b) implement Simpls c) automatic way of finding the factors using variance d) a nice documentation … e) be the best and fastest PLS package regarding other languages !!! :slight_smile:

2 Likes

@ckneale

Sent you an invite. I created some issues there regarding what you suggested.

1 Like

Some interesting issues after a talk.

1 Like

Cool! Any reason you didn’t call it simply PartialLeastSquares.jl? That would save some typing and I don’t think there’s a risk of confusion with other methods.

1 Like

There’s actually a few other methods, which are not the same as PLSR called Partial Least Squares. Partial least squares - discriminant analysis (PLSDA), and partial least squares structural equation modelling(PLS-SEM). So the name is fine. Abbreviating to something like “PLSRegression.jl” could work.

1 Like

Actually, this is the old name.

There was an issue about this.

Sumarizing, there were some patterns regarding mlj interface of the methods that I decided to follow. Module name and function were with same name.