Has anyone tried implementing the LiMuon optimizer?
I made an issue here in the Optimisers.jl package but thought I’d ask here as well if someone has implemented it elsewhere? Also any hands on experience with the method itself would be interesting to hear about.
According to the paper it beats AdamW in a few benchmarks both on training/testing error and convergence speed.