Julia Implementation of Transformer Neural Network Model

mjb · April 18, 2019, 6:23pm

I developed an implementation of a Transformer model using Flux 0.8.2. The implementation is based on The Annotated Transformer

If this is of interest, I am happy to contribute it (I don’t know how or where the best place to do that is). But it’s up on github so have a look.

I’m brand new to Julia, so I could use advice on Julia code style and improving performance. Both the token processing speed and convergence seem slower than the Annotated Transformer, written in Python. The trivial demo (included in the source) converges after roughly 30 epochs of 20 batches with 30 example sequences each, whereas the reference implementation converged after 6 similar epochs. I must be missing something, but I’m even less experienced with Python, and although I’ve scoured the code, I haven’t found any meaningful differences to the reference.

mjb · April 18, 2019, 6:35pm

Also note, that to see a working demo of the Transformer, see the function in transformer_demo() in src/transformer_demo.jl

anon92994695 · April 19, 2019, 12:38am

Thanks for this. I almost did exactly this about a month back but had other priorities.

datnamer · April 19, 2019, 6:08pm

@jekbradbury might be able to help

Topic		Replies	Views
Performance issues? New to Julia question , flux	9	1014	September 12, 2020
Flux Transformer Out of Memory Machine Learning	25	1501	March 13, 2023
How usable is Julia for NLP related ML tasks? General Usage	0	268	October 26, 2022
Flux: Machine Learning with Julia Machine Learning package , announcement	8	7870	March 3, 2017
Looking for a Flux RNN tutorial Teaching & Outreach	0	538	November 25, 2020

Julia Implementation of Transformer Neural Network Model

Related topics