I had the pleasure to give a lecture about the basics of neural nets at a university and purposely used Julia also for the practical lessons.
As a side effect I wanted to explain my students how an optimizer can be derived from scratch and why current first-order optimizers use EMA. So I purposely designed a non-EMA optimizer with the unforeseen result that in all test cases from the lecture it was a factor 2 to 20 faster than Adam.
I therefore wrote a proper scientific article about it and also published its Julia source code so that you can try it easily out if you like:
The paper can be found via its DOI:
By the way, it is impressive how quick the students learned Julia (and also how fast it is compared to PyTorch). About 1/3 of my lecture was live-coding and the interactive plotlyJS is thereby a killer feature.