Seeking advice on contributing to Julia: A faster Lasso solver

Hi there,

I’m a Master’s student in Statistics. I’ve recently been researching ‘safe screening rules’ for accelerating Lasso solvers. These can improve computation times by orders of magnitude when the number of features is large.

I’d like to create a package implementing these methods in Julia to create a faster Lasso solver than the standard co-ordinate descent algorithm. The reason for my post is that, while I’m a fairly experienced programmer and I (think I) know how to write good, maintainable code, I have never contributed to open-source projects before. I’m aware that writing code for use by others comes with a lot of baggage that writing code for oneself does not, so I’m looking for advice, be it general tips on open source development or Julia-specific advice.

Some specific questions I have in mind (apologies if any of these are overtly silly!):

  • There’s a Lasso.jl package already, implementing the standard co-ordinate descent solver. Would it be best for me to implement my algorithm within this package - modelling the code style etc on this package - and submit a pull request? Or develop my own package, ‘FastLasso.jl’ or something?

  • In the former case, how do I convince the maintainers of the package that my code and the the mathematics are sound? I could definitely write up a mathematical report of sorts - would that be expected approach? And I’m guessing they’d want to see applications of my solver and the existing solver to the same regression problem so that the two solutions can be verified to be the same (and computation times compared)?

Thanks!

3 Likes

Personally, I prefer fewer packages. But sometimes a PR is a drain on another maintainer’s time so it’s a difficult one.

I would write the code and keep it in a separate repo first, and then once i am happy with the result, I would write to the Lasso.jl maintainer to see if there’s apetite and bandwidth to incorporate it. If there is then I spend the effort to make it into PR. Otherwise I would make it into a separate pacakge.

I would also start the conversion with maintainer Lasso.jl as soon as I can. Preferably now to communicate the intention and options.

6 Likes

I spent some time thinking about this.
There are some areas in the Julia ecosystem that are pretty organized, where cooperation is easy. This doesn’t seem like one of those areas…yet

Here are some packages for penalized regression:
MLJ Linear Models @tlienart might be your best bet
Sparse Regression @joshday
Lasso.jl
Subset Selection
GLMNet (Fortran wrapper)
Orthogonal EM (unmaintained)
LARS (old)
Trimmed Lasso
FISTA, IHT (L1, MCP)
Constrained L1
EmpiricalRisks.jl

I’m sure there are many others.

It breaks my heart to see beautiful code & effort go to waste when ppl retire from maintaining a package. It would be great if we made @tim.holy’s strategy part of our culture: when he was ready to step down from maintaining Interpolations.jl he posted Interpolations.jl needs a new maintainer & instead of letting Interpolations.jl disappear into the abyss, another user offered to take over.

4 Likes

This is extremely helpful. Thank you.

Jeez, that’s a lot of packages for regression!

I actually tried searching for sparse regression solvers in Julia and the only packages I came across were Lasso.jl and glmnet.jl within the JuliaStats ecosystem. Either I’m awful at searching for Julia packages, or these packages are difficult to find for beginners to the Julia language (which I am), which isn’t ideal.

Out of curiosity, how did you know of these packages? Because you’re active within the Julia community or are they advertised in a nice fashion somewhere that I’ve completely missed?

1 Like

I copy and pasted these packages from a list I had prepared months ago when I was new to Julia & wanted to contribute to some of the ML packages.

There is a ton of great Julia stuff that is hard to discover.

Here is what I wrote about this in March:

To repeat, MLJLinearModels.jl is likely your best bet, prob b/c it has great maintainers & is part of the AlanTuringInstitute.

1 Like

When implementing a specific, mostly standalone algorithm, I think it is best to put it in a small package, well-tested, documented, with a very lightweight interface.

This has the following advantages:

  1. the code should remain functional work a long time, because of the simple setup, and should be easy to update when needed,
  2. multiple packages can make use of it, possibly incorporating the algorithm into their own interface via a wrapper. All will benefit from improvements, and consequently the motivation to contribute increases.
  3. if a particular dependent project is abandonned, your algorithm implementation will continue to be around (as opposed to having to be excavated from a complex and abandonned codebase).
9 Likes

all this being said… In my opinion all “penalized regression” can be thought of as Bayesian models with MAP estimators… instead of having lots of different ways of achieving that, it’s better to have a general purpose modeling language and some really good optimization routines… voila you have regression for free

This is great, thanks for your input. You’ve convinced me to go down this route!

1 Like