This is an open invitation to the community. PartialLeastSquaresRegressor.jl is a package made in Julia with the aim of solving regression problems, especially when there are few samples. Very suitable for domains such as biomedical, healthcare, chemistry or even other domains.
The package now has a new interface and is registered with MLJ. As soon as possible it will be possible to use it by calling via @load from MLJ.
If you want to help contribute to the package, welcome!
Below, a little bit of development history for those interested:
Three years ago I started the development of a package on an algorithm called Partial Least Squares Regressor very efficient for problems in which we have few samples. This was the first implementation, I believe, of this regressor in the Julia language. At the time, if I remember correctly, there was no implementation even in Scikit-learn and in R packages. During development, I implemented three versions of the algorithm: PLS1, a linear regressor with a single target; PLS2, a multitarget linear regressor; and finally, Kernel PLS, a multitarget regressor for non-linear problems. All of these in one package. Over time some colleagues helped us, mainly, @filipebraida.
Recently, the MLJ team (@ablaom , @tlienart ), and with help of @azev77, found us and helped us a lot so that we could put an interface for MLJ. Ah, we also changed the name of the package to obtain greater conciseness and clarity.
Also you might want to add diagonistics like Leverage, Explained variance in X&Y, and Q& Hotelling statistics as in here: https://github.com/caseykneale/ChemometricsTools.jl/blob/master/src/ModelAnalysis.jl#L21. This goes in line with forming a regression coefficient vector, many people will inspect their regression coefficients visually or mathematically. So I really recommend the approach in the previous post.
PLS is close to useless without those kinds of diagnostics…
I am kind of involved in a lot of stuff too, but I could contribute most of that stuff for you pretty easily. Maybe sometime this week I could get too it if that’s OK?
I don’t know how or if the MLJ hooks can handle billinear things but I can at least add them to the base repo.
I wouldn’t worry about it, I feel like you could say R already did it for tons of stats things like in that Southpark episode where the they utter the phrase the Simpsons already did it!
Hi. I Will create some issues in the repo. Some of the enhancements are: a) perform benchmarks for each algorithm and find ways to improve execution time b) implement Simpls c) automatic way of finding the factors using variance d) a nice documentation … e) be the best and fastest PLS package regarding other languages !!!
Cool! Any reason you didn’t call it simply PartialLeastSquares.jl? That would save some typing and I don’t think there’s a risk of confusion with other methods.
There’s actually a few other methods, which are not the same as PLSR called Partial Least Squares. Partial least squares - discriminant analysis (PLSDA), and partial least squares structural equation modelling(PLS-SEM). So the name is fine. Abbreviating to something like “PLSRegression.jl” could work.
Nice! I did a manual and quite bad implementation of NIPALS algorithm a little while ago for a assignment in Linear Models. Definitively not as elegant as your code.
However, one thing I would love is if there’s an argument in order to return the residue matrices.