Since almost 2 years I use Flux and Lux and thus the Optimisers.jl package. Since I need for scientific reasons also tests with optimizers not supported by the Optimisers.jl package, I wrote them for my own.
I want now to offer them to Optimisers.jl but before I make there a pull request I want to discuss/learn how it works.
In effect I want to contribute the 2 optimizers: Yogi and AdaBelief with weight decay.
- Is there a reason why Optimisers.jl does not support the Yogi optimizer?
- Regarding AdaBelief with weight decay - this is my daily work horse. hereby I need to learn how it could be implemented because I cannot just add the weight decay to the existing code because then the algorithm would no longer follow the scientific paper about AdaBelief. So do I for example have to find a new name for it?
I put the code I have to far online:
and