I am looking to implement the following paper in Julia ( 2018_commonsense_ZhouHao_3_TYVQ7Iq.pdf (tsinghua.edu.cn)) . It has an existing TensorFlow implementation on a very old TensorFlow version, so I felt it would be nice to try and implement it in Julia.
How good is the packages for Transformer Based Modelling (attention/self-attention, GRU, etc), and GPU acceleration for these? Also, I plan to implement complex valued neural networks (atleast partial regions of the model), would it be feasible with the current libraries, where should I look for to make modifications on the non-linearity (RelU), drop-out, iteration, etc (and gradients)? Also, what about the other aspects like tokenizers, etc?
I am unsure if I should try implementing on pyTorch (which supports complex numbers in the latest versions) or on Julia, I find Julia nicer to work with in general, but have not tried much machine learning on it.
Are there any end-to-end tutorials in Julia for these, that cover implementing tokenizers, the models, and the overall outputs word embedding mapper, and evaluation metrics?